Free Guide — 2026 Edition

The Complete Guide to eDiscovery

Everything you need to run eDiscovery from legal hold to court-ready production — with practical checklists, cost benchmarks, and the strategies used by top litigation teams.

What Is eDiscovery

Electronic discovery (eDiscovery) is the process of identifying, collecting, reviewing, and producing electronically stored information (ESI) in connection with litigation, regulatory investigations, or internal inquiries. It is a core obligation in virtually every civil lawsuit and many criminal and regulatory matters in the United States.

The scope of ESI is broad: emails, documents, spreadsheets, presentations, text messages, Slack and Teams conversations, voicemails, social media posts, database records, and any other information stored in digital form. The average Fortune 500 company generates 2.5 billion emails per year. When litigation or an investigation arises, the legal team must determine which of these electronic records are relevant, review them for responsiveness and privilege, and produce them to the opposing party or regulator — all within court-imposed deadlines and under the threat of sanctions for non-compliance.

The stakes are significant. Federal Rule of Civil Procedure 37(e) authorizes courts to impose sanctions — including adverse inference instructions and default judgments — for the failure to preserve or produce ESI. In practice, eDiscovery costs typically represent 60-80% of total litigation costs, and production errors can lead to motions to compel, fee-shifting, and waiver of privilege.

Why This Guide Exists

Most eDiscovery guides are written by vendors selling platforms. This guide is written for the practitioner who needs to understand the full workflow, make informed decisions about technology and process, and avoid the specific pitfalls that lead to sanctions, cost overruns, and lost cases. We include cost benchmarks and practical checklists at every stage.

The EDRM Framework

The Electronic Discovery Reference Model (EDRM) is the standard framework for understanding the eDiscovery workflow. It describes nine stages, from the trigger event through presentation at trial. In practice, most litigation teams focus on seven core stages: legal hold, identification, collection, processing, review, analysis, and production.

These stages are not strictly sequential. Review findings often trigger additional collection. Analysis may reveal the need for broader searches. Production may need to be re-run after QC failures. The EDRM is best understood as a reference architecture, not a waterfall process. Each stage feeds back into the others, and effective eDiscovery requires the ability to iterate quickly as the case develops.

The sections that follow walk through each stage in detail, with specific guidance on what to do, what to avoid, and how to control costs at every step.

A legal hold (also called a litigation hold or preservation notice) is a directive issued to custodians and IT departments requiring them to preserve all potentially relevant ESI. The duty to preserve is triggered when litigation is reasonably anticipated — not when the complaint is filed. This means the hold obligation can arise weeks or months before any lawsuit is actually commenced.

The consequences of failing to issue or enforce a legal hold are severe. In Zubulake v. UBS Warburg, the court imposed an adverse inference instruction after the defendant failed to preserve emails despite a clear litigation hold obligation. More recently, courts have imposed monetary sanctions in the six- and seven-figure range for preservation failures, even where the failure was negligent rather than intentional.

What a proper legal hold requires:

Common Mistake

Issuing a legal hold notice but failing to follow up. Courts have held that a legal hold is ineffective if the issuing party does not take reasonable steps to verify compliance. A hold notice sitting unread in an employee's inbox provides no preservation.

2. Identification

Identification is the process of determining which custodians, systems, and data sources contain potentially relevant ESI. This stage sets the scope for everything that follows: if you identify too narrowly, you risk missing relevant documents and facing sanctions for inadequate search. If you identify too broadly, you incur unnecessary costs in collection, processing, and review.

Start with custodians. Work with the case team to identify every individual who may have created, received, or stored relevant documents. This typically includes the named parties, their direct reports, key decision-makers, and anyone involved in the events at issue. For each custodian, map their data sources: email (which platform?), local files, shared drives, cloud storage (OneDrive, Google Drive, Dropbox), messaging platforms (Slack, Teams, Signal), mobile devices, and any enterprise applications (CRM, ERP, project management tools) they use.

Don't forget non-custodial sources. Shared mailboxes, distribution lists, SharePoint sites, shared drives, and database systems often contain critical documents that are not attributable to any single custodian. Identify these sources early, because they often require different collection methods and may contain unique documents not found in individual custodian collections.

Document your identification decisions. The meet-and-confer process under FRCP Rule 26(f) requires parties to discuss preservation and discovery issues, including data sources. Being able to articulate why you included or excluded specific custodians and data sources is essential if your search scope is later challenged.

3. Collection

Collection is the process of extracting ESI from identified sources in a forensically defensible manner. The key principle is that collection must preserve the integrity and metadata of the original documents. Metadata — dates, authors, recipients, file properties — is often as important as the document content itself, and collection methods that alter or strip metadata can compromise the entire production.

Defensible collection requires:

For modern messaging platforms like Slack and Microsoft Teams, collection presents unique challenges. Messages are stored in cloud environments controlled by the platform provider, conversation threads may span months or years, and attachments may be stored separately from the messages that reference them. See our detailed guide on collecting Slack and Teams data for specific strategies.

For mobile devices, collection typically requires either a mobile forensics tool (Cellebrite, GrayKey) for a full device image, or targeted collection of specific applications. The choice depends on the scope of the discovery obligation and the sensitivity of the device contents. Our mobile data guide covers the decision framework.

4. Processing

Processing transforms raw collected data into a format suitable for review. This includes extracting text and metadata from files, expanding container formats (ZIP, PST, OST, NSF), converting files to reviewable formats (TIFF, PDF), running OCR on image-only documents, and de-duplicating across custodians and data sources.

De-duplication is one of the most impactful processing steps. In a typical multi-custodian collection, 30-60% of documents are duplicates — the same email received by multiple custodians, the same document stored in multiple locations. De-duplication reduces the review population proportionally, directly reducing the cost and time required for document review. The standard approach is global de-duplication by MD5 hash, which removes exact duplicates across all custodians while preserving unique instances.

Date and keyword filtering during processing can further reduce the review population. Applying date ranges that correspond to the relevant time period and excluding file types that are categorically non-responsive (system files, executables, font files) can eliminate 20-40% of the collection before review begins. However, filtering decisions should be documented and defensible — overly aggressive filtering can lead to allegations of inadequate search.

Processing Benchmark

A well-processed dataset typically reduces the review population by 40-70% compared to the raw collection through a combination of de-duplication, date filtering, and file-type exclusions. For a 100GB collection, this can mean the difference between reviewing 500,000 documents and reviewing 175,000 documents.

5. Document Review

Document review is where the legal team examines each document for responsiveness (is it relevant to the discovery request?), privilege (is it protected by attorney-client privilege or work product doctrine?), and confidentiality (does it contain trade secrets, PII, or other sensitive information requiring protection?). Review is traditionally the most expensive phase of eDiscovery, typically accounting for 70-80% of total eDiscovery costs.

Traditional managed review involves teams of contract attorneys reviewing documents one at a time, coding each document for responsiveness, privilege, and other categories. Review rates vary, but a typical contract reviewer processes 50-75 documents per hour at a cost of $25-45 per hour, resulting in an all-in cost of $0.50-$1.50 per document including supervision and quality control.

Technology-assisted review (TAR), also called predictive coding, uses machine learning to prioritize and classify documents based on a set of training documents coded by senior attorneys. TAR 2.0 (continuous active learning) has been widely accepted by courts since Rio Tinto v. Vale (2015) and can reduce the number of documents requiring human review by 60-80%. However, TAR requires careful protocol development, seed set selection, and validation — and the training process itself can take days or weeks.

AI-powered review represents the next generation. Unlike TAR, which requires extensive training on each new matter, modern AI platforms can classify documents using natural language understanding without matter-specific training data. This dramatically reduces the time from data ingestion to review-ready classification. The cost model also shifts: rather than paying per-reviewer-hour, teams pay per-document or per-GB, typically at $0.05-$0.15 per document — a 90%+ reduction compared to managed review.

Metric Managed Review TAR 2.0 AI-Powered Review
Cost per Document $0.50–$1.50 $0.25–$0.75 $0.05–$0.15
Setup Time 1–2 weeks 1–3 weeks (training) Hours
Review Speed (100K docs) 4–8 weeks 2–4 weeks Days
Consistency Variable (reviewer fatigue) Good (model-based) High (deterministic)
Court Acceptance Established Widely accepted Growing (defensible with validation)

Regardless of the review method, quality control is non-negotiable. Sample-based QC (reviewing a random sample of coded documents to measure error rates), inter-reviewer agreement analysis, and senior attorney spot-checks should be built into every review workflow. Courts expect that producing parties can demonstrate the reliability and consistency of their review process. For detailed QC procedures, see our Production QC Checklist.

6. Analysis

Analysis goes beyond individual document review to identify patterns, relationships, and strategic insights across the document population. While review asks "is this document responsive?", analysis asks "what does this collection of documents tell us about the case?"

Key analysis workflows include:

Effective analysis can transform the strategic position of a case. In one commercial litigation matter, document analysis surfaced contractual clauses and deposition inconsistencies that directly contributed to a $15.4M jury verdict. In a construction defect case, automated cross-referencing of engineering reports and contractor communications identified systemic defect patterns across multiple buildings that would have taken months to uncover manually.

7. Production

Production is the final stage: assembling the reviewed documents into a package that meets the format and content requirements agreed upon in the ESI protocol or ordered by the court. A production typically includes the documents themselves (in native, image, or both formats), a load file containing metadata, Bates numbering, and a privilege log listing all documents withheld on privilege grounds.

The ESI protocol governs the production format. It should be negotiated early in the case and should specify: file formats (native, TIFF, PDF), metadata fields to be produced, Bates numbering conventions, redaction requirements, confidentiality designations, and delivery method. Getting the ESI protocol right upfront prevents costly disputes and re-productions later. See our ESI protocol guide for negotiation strategies.

Bates numbering provides a unique identifier for every page in the production. Numbers must be sequential with no gaps or duplicates, and family groups (parent emails and their attachments) should be numbered consecutively. See our QC checklist for detailed verification steps.

Privilege logs must list every document withheld on privilege grounds, with enough detail to support the claimed privilege without revealing the privileged content. Courts have little patience for generic descriptions ("email re: legal matter") and will order production of documents where the privilege log fails to establish the elements of the privilege. For a comprehensive treatment, see our privilege log guide.

Redactions must be applied at the data layer (not visual-only overlays) and flattened so they cannot be removed. Visual-only redactions — where a black box is placed over text but the underlying content remains selectable — are among the most common and most damaging production errors. See our redaction guide for the proper approach.

Production Benchmark

A well-run production of 30,000 documents with Bates numbering, privilege log, and redactions should be deliverable in 3-5 days using modern tooling, versus 3-4 weeks with traditional methods. See the Tax Credit Investigation case study for real-world benchmarks.

See how DecoverAI handles production end-to-end
From document upload to court-ready output — Bates numbered, redacted, privilege-logged — in under an hour.
Book a Demo →

Cost Benchmarks for eDiscovery

Understanding eDiscovery costs is essential for budgeting, vendor negotiations, and making informed decisions about technology investments. Costs vary significantly based on data volume, complexity, and the approach used. The benchmarks below reflect 2026 market rates across the three primary cost models.

Phase Traditional (Law Firm) ALSP / Managed Service AI-Powered Platform
Collection & Processing $500–$2,000 / GB $150–$500 / GB $50–$150 / GB
Document Review $0.50–$1.50 / doc $0.25–$0.75 / doc $0.05–$0.15 / doc
Production $100–$300 / GB $50–$150 / GB Included
Hosting $25–$75 / GB / month $15–$40 / GB / month $60 / GB / month (all-in)
Total (10GB matter, 50K docs) $50K–$100K $20K–$50K $3K–$8K

The economics of eDiscovery are changing rapidly. AI-powered platforms have compressed costs by 10-20x compared to traditional approaches, while simultaneously improving speed and consistency. For small and mid-size matters (under 50GB), the cost difference is particularly stark: what used to require a $50,000 budget can now be accomplished for under $5,000.

The most important cost lever is reducing the volume that requires human review. Every document that can be accurately classified by AI is a document that does not require a contract reviewer at $25-45/hour. Processing-stage culling (de-duplication, date filtering) and AI-powered first-pass classification are the two highest-ROI investments in any eDiscovery workflow.

AI in eDiscovery: What Works and What Doesn't

AI has transformed eDiscovery, but the technology landscape is still maturing. Understanding what AI can and cannot do reliably is critical for both efficiency and defensibility.

What AI does well today:

Where human judgment is still essential:

The defensibility of AI-assisted review is well-established. Courts have recognized that technology-assisted review can be more accurate than exhaustive manual review, and no court has required a party to use manual review where technology-assisted review was available and properly validated. The key to defensibility is transparency and validation: document your methodology, measure your accuracy, and be prepared to explain your process. For a deeper treatment, see our guide on AI review defensibility.

Choosing an eDiscovery Platform

The eDiscovery platform market ranges from legacy enterprise tools to modern AI-powered platforms. The right choice depends on your matter volume, technical capabilities, budget, and workflow requirements. Here are the factors that matter most:

For teams evaluating a platform switch, our platform migration guide covers data portability, format compatibility, and the specific steps to migrate without losing data or work product.

Master eDiscovery Checklist

Use this checklist as a starting framework for every new matter. Not every item will apply to every case, but reviewing the full list ensures nothing critical is missed.

Pre-Litigation / Legal Hold
  • Identify trigger event and date preservation duty arose
  • Issue written legal hold notices to all relevant custodians
  • Coordinate with IT to suspend auto-deletion policies
  • Document all preservation steps and custodian acknowledgments
  • Schedule periodic hold reminders (quarterly)
Identification & Scoping
  • Identify all custodians with potentially relevant ESI
  • Map data sources for each custodian (email, files, cloud, mobile, messaging)
  • Identify non-custodial data sources (shared drives, databases, enterprise apps)
  • Document identification decisions and rationale
  • Prepare for Rule 26(f) meet-and-confer on ESI issues
Collection
  • Use forensically defensible collection methods
  • Calculate and record hash values at point of collection
  • Maintain chain of custody documentation
  • Preserve all metadata (dates, authors, recipients)
  • Log any collection errors or inaccessible files
Processing
  • Expand container files (PST, ZIP, OST, NSF)
  • Run global de-duplication by hash value
  • Apply defensible date range and file-type filters
  • Run OCR on image-only documents
  • Verify processing completion rates and error logs
Review
  • Define coding categories (responsive, non-responsive, privilege, confidential)
  • Establish review protocol and coding manual
  • Implement QC sampling and inter-reviewer agreement checks
  • Conduct privilege review with senior attorney oversight
  • Document review methodology for defensibility
Production
  • Verify ESI protocol compliance (format, fields, naming conventions)
  • Confirm Bates numbering is sequential with no gaps or duplicates
  • Verify all redactions are data-layer and flattened
  • Cross-reference privilege log against withheld documents
  • Validate load file metadata fields and file path references
  • Spot-check image quality and native file integrity
  • Final senior attorney sign-off before release
Ready to see this workflow in action?
DecoverAI handles every stage from upload to court-ready production. See it with your own data.
Book a Demo →
Ready to modernize your eDiscovery workflow?

See how DecoverAI can cut your review costs by 90% and deliver productions in days, not weeks.

Book a Demo →