What Does “Culled” Mean in Law and Ediscovery?
by Petra Pasternak
In ediscovery, culling is the process of reducing the volume of electronically stored information, or ESI, before attorney review. Culling refines the data set by removing irrelevant, duplicate, or otherwise non-responsive data.
The word “cull” worked its way into English in the Middle Ages from the Old French “cuillir,” which means “to gather, select, or pick.” Historically, it was used in farming and hunting to describe the process of eliminating unwanted stock to improve the quality of the herd or crop.
When a legal dispute begins, corporate legal departments and law firms are routinely confronted with terabytes of unorganized data from enterprise mail servers, collaboration tools like Slack and Microsoft Teams, and cloud storage repositories. Processing and reviewing every single file generated by an organization is both financially unfeasible and strategically inefficient.
Strategic filtering around such parameters as date ranges, keywords, file types, or sender domains eliminates non-responsive material and acts as the operational bridge between broad data collection and targeted document review.
Data Culling: Why Reducing Document Volume Matters in Ediscovery
Culling is essential in ediscovery because modern litigation is defined by skyrocketing data volumes. A single custodian’s digital footprint can encompass tens of thousands of emails, modern hyperlinked attachments, or ephemeral messaging logs. If a litigation team attempts to send this raw, unrefined data directly to linear document review, the financial and operational costs can quickly derail the entire case strategy.
Proper culling is heavily tied to the principle of proportionality under Federal Rule of Civil Procedure (FRCP) 26(b)(1). Courts mandate that discovery must be proportional to the needs of the case, considering factors like the amount in controversy, the parties’ resources, and whether the burden of the discovery outweighs its likely benefit. Culling ensures that organizations do not spend disproportionate resources hosting and reviewing non-essential files, keeping teams firmly compliant with federal mandates.
Proper culling reduces the scope of data involved in ediscovery in a way that is defensible — demonstrating reasonableness and good faith efforts to produce responsive materials.
Implementing a defensible, automated data reduction strategy offers several distinct advantages for litigation teams:
Reduces Review Time: By removing junk files, system data, and clear non-sequiturs, document reviewers can focus exclusively on responsive material.
Lowers Cost: Reviewing ESI is usually the most expensive phase of ediscovery. Reducing the document volume significantly lowers vendor hosting fees and external legal review bills.
Improves Focus: Legal teams can pinpoint key factual narratives and critical evidence earlier when they aren’t forced to sift through terabytes of noise to find gigabytes, or megabytes, of signal.
Supports Proportional Discovery: It provides a justifiable mechanism to align discovery scope with the actual stakes of the litigation, satisfying judicial expectations under FRCP 26.
Streamlines Case Strategy: Culling is foundational to early case assessment (ECA), allowing counsel to evaluate exposure risks and make informed settlement or litigation choices before major investments are made.
What Is the Difference Between Culling, Filtering, and Review?
While processing, culling, and review all shape the lifecycle of data, they occur at distinct times in the ediscovery workflow, requiring different methodologies and achieving entirely separate operational goals. It’s common for newer legal professionals to confuse these similar-sounding terms, but they have very different meanings.
The Ediscovery Data Pipeline
Culling happens in the early phases of the standard ediscovery workflow, before humans begin reviewing specific documents.
Think of the process as a four-step data pipeline:
Processing: The technical ingestion of raw data. During this stage, files are extracted from containers (like .zip or .pst files), text is extracted via Optical Character Recognition (OCR), and system metadata is indexed. No data is eliminated here; it is simply made searchable.
Culling: The application of broad, objective rules to the processed data to remove large swaths of irrelevant files. Data culling examples include removing all files outside a specific time frame or filtering out system files (.dll or .exe).
Review: The phase where human reviewers or advanced machine learning models evaluate the culled, refined document set. Reviewers make nuanced legal determinations regarding responsiveness, confidentiality, and attorney-client privilege.
Production: The final phase where responsive, non-privileged documents are formatted (often as .pdf's or .tiff images with accompanying load files) and securely delivered to opposing counsel.
Document Culling vs. Data Deletion: Is Culling the Same as Deleting?
No, culling is completely distinct from data deletion, destruction, or wiping. Conflating these terms introduces massive defensive vulnerabilities, specifically regarding the digital spoliation of evidence under FRCP Rule 37(e).
Culling does not mean destroying or erasing data. When data is culled, it is simply excluded from the active review set housed within an ediscovery platform; the original data remains completely intact and securely preserved within its master repository or forensic collection.
Understanding this distinction is critical for mitigating risks described in the guide Digital Spoliation of Evidence: Risks, Rules, and Prevention. The common-law duty to preserve data is triggered the moment litigation is reasonably anticipated, mandating that organizations pause automated deletion cycles and issue formal legal holds. If a party permanently deletes potentially relevant ESI after this trigger, they face severe judicial sanctions under Rule 37(e), ranging from monetary penalties to adverse inference instructions or even case dismissal.
Defensible culling allows legal teams to aggressively minimize their active review database to save hosting fees and review hours, while keeping the broader data pool safely preserved. This ensures that if opposing counsel challenges the culling parameters during a meet-and-confer session, the excluded data can be easily audited, restored, or adjusted without any loss of data integrity.
How Technology Supports Defensible Culling
Modern ediscovery platforms provide sophisticated analytics tools that elevate culling from a keyword exercise into a more defensible, auditable process. Relying on basic search terms alone can lead to missed data due to issues such as variation in terminology, misspellings, or incomplete search strategies. It can also overinflate review sets with irrelevant hits.
Advanced legal technology helps reduce these blind spots while maintaining a documented audit trail of decisions.
As highlighted in Ediscovery vs. Digital Forensics: Key Differences Explained, while digital forensics focuses on bit-level imaging and reconstructing hidden device histories, ediscovery software excels at processing, categorizing, and analyzing collected ESI at scale.
To achieve defensible data reduction, modern ediscovery platforms leverage several advanced technical capabilities:
Global Deduplication: Identifies identical files across a dataset via cryptographic hashing. If 20 custodians received the same company-wide email, deduplication allows users to review one copy and maps the other custodians to it, significantly reducing review volume.
Near-Duplicate Detection: Groups textually similar documents together, such as multiple drafts of a contract or documents with minor edits.This lets reviewers spend less time on repetitive content and focus on meaningful differences between versions.
Email Threading: Organizes email conversations into chronological, visual threads. Reviewers can focus on the final, inclusive thread that contains the entire conversation history.
AI-Assisted Workflows: Employs machine learning, conceptual clustering, and predictive coding during early case assessment to identify patterns and prioritize rather than just keywords. These capabilities help teams identify potentially irrelevant data earlier and make more informed decisions before full-scale review.
By combining these tools with documented workflows, legal teams can reduce review volume, control costs, and create an audit trail that supports a defensible process.
Find out how a modern ediscovery platform like Everlaw helps legal teams defensibly cull large datasets while maintaining transparency.
Petra Pasternak is a writer and editor focused on the ways that technology makes the work of legal professionals better and more productive. Before Everlaw, Petra covered the business of law as a reporter for ALM and worked for two Am Law 100 firms. See more articles from this author.