Early Case Assessment (ECA) is the structured process by which a litigation team gathers, analyzes, and evaluates information about a dispute at the earliest possible stage in order to make informed strategic and economic decisions. The Sedona Conference Glossary defines ECA as "the process of assessing the merits of a case early in the litigation lifecycle to determine its viability," noting that the process "may or may not include the collection, analysis, and review of data." In practice, modern ECA almost always involves data — because the merits of a case in 2026 are almost always locked inside email threads, Slack channels, financial systems, and shared drives.
The purpose of ECA is to reduce uncertainty. At the moment a complaint arrives or a regulatory inquiry lands, counsel typically knows very little about the actual facts: who the relevant custodians are, what evidence exists, how strong each side's position will turn out to be, and how much the matter will cost to defend or prosecute. Every one of those unknowns has a price tag. The longer they remain unknown, the more difficult it becomes to make rational decisions about settlement, motion practice, custodian negotiation, and review scope.
A disciplined ECA process replaces guesswork with evidence. By the end of an effective assessment, the legal team should be able to answer four questions with confidence: What happened? Where is the evidence? How much will it cost to develop? What is the realistic range of outcomes? When those questions can be answered in days rather than months, the entire economics of the matter shift. Settlement conversations happen earlier and with better information. Discovery negotiations are grounded in actual data volumes. Litigation budgets stop being aspirational and start being defensible.
The cost of not doing ECA is severe. Matters that could have been settled for six figures often grow into seven- and eight-figure ordeals because the parties did not understand the evidentiary landscape until they were already deep into review. Document review costs alone routinely consume seventy percent or more of total ediscovery spend — and the single largest driver of review cost is the volume of irrelevant material that gets processed and reviewed because no one took the time to scope the matter properly up front.
The Electronic Discovery Reference Model (EDRM) is the canonical framework for organizing the ediscovery lifecycle. It begins with Information Governance and proceeds through Identification, Preservation, Collection, Processing, Review, Analysis, Production, and Presentation. Early Case Assessment is not a discrete EDRM stage; rather, it lives across the early stages — primarily Identification, Preservation, Collection, and the front end of Processing — and informs the strategic posture of everything that follows.
During Identification, ECA focuses on mapping the universe of potentially relevant data. Who are the likely custodians? What systems do they use? What date ranges matter? What categories of communications and documents are likely to bear on the issues? The output of this stage is not a final list but a working hypothesis about scope — one that will be refined as the team learns more.
In Preservation, ECA intersects with the legal hold process. Decisions about whom to put on hold, which systems to image, and how broadly to suspend deletion policies all depend on the preliminary scope developed during identification. ECA helps avoid the two failure modes that plague preservation: under-preservation (which creates spoliation risk) and over-preservation (which creates massive downstream cost). For more on getting this right, see our piece on litigation holds that actually work.
Collection is where ECA gets concrete. Rather than collecting everything from every potentially relevant custodian, a sophisticated ECA process collects targeted samples first — perhaps the email of two or three key custodians over a focused date range — and uses what it learns from that sample to refine the broader collection plan. The early front end of Processing then deduplicates, indexes, and enriches the sample so the team can start running searches, building timelines, and testing theories before the full collection is even complete.
Effective ECA depends on assembling the right inputs at the right time. The first input is a list of custodians — the individuals whose data is most likely to be relevant. This list should be developed through interviews with the client, review of organizational charts, and analysis of the underlying claim. It is almost always wrong on the first pass. Custodians get added as the team learns who else was copied on key threads; custodians get removed as it becomes clear they had no involvement in the relevant events. A good ECA process treats the custodian list as a living document.
The second input is a map of data sources. Email is the obvious starting point, but in 2026 it is rarely sufficient. Slack and Microsoft Teams now hold a substantial portion of the day-to-day communications that used to live in email. SharePoint, Google Drive, Dropbox, and Box hold the working files. Salesforce, Workday, NetSuite, and dozens of other SaaS applications hold the structured business records. Mobile devices hold text messages and ephemeral chat. The data map should identify each of these sources, the volume of data in each, and the technical mechanisms available for collection.
The third input is the relevant date range. Date range is one of the most powerful levers for controlling scope. A matter that initially looks like it could span five years often turns out to hinge on a six-week window of activity surrounding a specific transaction or decision. Narrowing the date range based on early factual development can reduce data volumes — and review costs — by an order of magnitude. The trade-off is that overly aggressive narrowing can miss relevant context, so date ranges should be revisited as facts develop.
The fourth input is a working set of key terms and concepts. These are not the search terms that will eventually be negotiated with opposing counsel; they are the team's internal hypotheses about what language the relevant documents will contain. Names of key players, project codenames, product references, technical jargon, and the vocabulary of the alleged conduct all belong on this list. The list should be tested empirically against the data — and refined when the results show that terms are returning too much noise or missing obvious hits.
Data mapping is the foundational technique. Before any sophisticated analysis is possible, the team needs to know what data exists and where. A good data map records each system, the type of data it holds, the custodians associated with it, the volume and date range of data available, the export mechanisms supported, and any technical or legal constraints on collection. This map becomes the reference document for every subsequent decision about scope, cost, and production format.
Sampling is the workhorse technique of ECA. Rather than processing and reviewing the entire collection, the team takes a statistically meaningful sample — often a few thousand documents from the most likely custodians and date ranges — and reviews it carefully. The sample reveals what the data actually looks like: how much of it is genuinely relevant, what proportion is privileged, what categories of irrelevant material dominate the corpus, and what unexpected patterns are present. Decisions about full-collection scope, custodian additions, search-term refinement, and review cost are all informed by what the sample reveals.
Keyword analysis tests candidate search terms against the data and reports hit counts, unique hit counts, and family hit counts for each term. A term that hits ten million documents is too broad; a term that hits zero is either misspelled or missing relevant context. Iterative tuning of keyword lists — expanding stems, adding proximity operators, adding negative terms to exclude noise — is one of the highest-leverage activities in ECA. The output is a defensible search-term list that can be shared with opposing counsel and used to scope downstream review.
Concept clustering uses unsupervised machine learning to group documents by topical similarity, revealing the major themes in a collection without requiring the team to know in advance what to look for. A cluster labeled "vendor onboarding" or "customer complaints" or "regulatory correspondence" may surface entire bodies of relevant material that would never have been found through keyword search alone. Clustering is particularly valuable in investigations where the team does not yet know what it is looking for.
Communication network analysis maps who communicated with whom, how often, and about what. By visualizing email and chat traffic as a graph, the team can quickly identify the central players in a dispute, spot unexpected relationships, and find custodians who were not on the original list but who participated heavily in the relevant communications. Communication patterns also reveal timing — spikes of activity around key events — that often points the team toward the most important documents in the collection.
The point of ECA is not the analysis itself but the decisions the analysis enables. The first output is a case score — a structured assessment of the strengths and weaknesses of the matter based on the evidence developed so far. A good case score addresses liability (how strong is the underlying claim or defense?), damages (what is the realistic exposure?), discovery risk (how much bad evidence is likely to surface?), and procedural posture (what motions are likely to succeed?). It is necessarily preliminary, but it gives the client and the team a common reference point for strategic discussions.
The second output is a settle-versus-fight recommendation. With a defensible case score in hand, counsel can have a candid conversation about the realistic value of early settlement versus continued litigation. Settlement decisions made in the absence of ECA tend to be driven by gut feel, anchoring effects, and litigation fatigue. Settlement decisions informed by ECA are grounded in the actual evidentiary picture — which is the only basis on which a client can make a rational choice about how much risk to accept and how much money to spend.
The third output is a budget forecast. ECA produces concrete data volumes, custodian counts, and complexity estimates that allow the team to build a defensible budget for the remainder of the matter. The budget should break out collection, processing, hosting, review, expert costs, motion practice, and trial preparation, with confidence intervals around each line item. A budget produced after ECA is dramatically more accurate than one produced from a blank-page guess at the start of the matter — and it gives the client the information needed to authorize spending with eyes open.
The fourth output is an ESI strategy for the matter going forward: which custodians to negotiate, which date ranges to push for or against, which search-term lists to propose, and which production formats to insist on. The empirical findings from ECA give counsel leverage in meet-and-confer negotiations and credibility with the court. When you can tell opposing counsel "we ran your proposed terms against our sample and they return 4.2 million hits, eighty-seven percent of which are clearly irrelevant," you have a much stronger position than if you are arguing from intuition alone.
Federal Rule of Civil Procedure 26(b)(1) defines the scope of discovery in federal court and is the legal foundation for nearly every modern ECA conversation. The rule permits discovery of any nonprivileged matter that is relevant to a party's claim or defense and proportional to the needs of the case, considering the importance of the issues at stake, the amount in controversy, the parties' relative access to relevant information, the parties' resources, the importance of the discovery in resolving the issues, and whether the burden or expense of the proposed discovery outweighs its likely benefit.
The 2015 amendments that introduced the proportionality language were not cosmetic. Courts now routinely deny or limit discovery requests that are not proportional, and they increasingly expect both sides to come to discovery negotiations armed with empirical data about volumes, costs, and relevance rates. ECA is the mechanism by which that data is generated. Without it, a party arguing that a discovery request is disproportionate is essentially asking the court to take its word for it — an argument that rarely persuades.
The Sedona Conference's Commentary on Proportionality in Electronic Discovery articulates six principles that reinforce this point, including the principle that the burdens and costs of preserving relevant ESI should be weighed against the potential value of the information, and that technologies and processes used for discovery should be reasonable and proportionate. Sedona Principle 6 separately recognizes that responding parties are best positioned to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own ESI — which is why a defensible ECA process is so important when those decisions are challenged.
For a deeper treatment of how proportionality plays out in practice, see our article on proportionality in ediscovery. The short version: in 2026, a litigation team that cannot speak the language of proportionality — and back it up with data — is at a significant disadvantage in front of any federal judge.
For most of the past two decades, ECA was constrained by the cost and slowness of human review. Even with the help of analytics, getting a meaningful read on a corpus required attorneys or contract reviewers to look at thousands of documents over days or weeks. Large language models have collapsed that timeline. A modern LLM can read, classify, and summarize tens of thousands of documents in the time it takes a human reviewer to finish a single batch — and it can do so in natural language rather than through brittle keyword logic.
Semantic search is the most immediately useful application. Instead of asking the system to find documents containing a specific phrase, the team can ask it to find documents about a concept — "complaints about delayed shipments," "internal discussions of the SEC inquiry," "evidence of pricing coordination" — and the system returns relevant material whether or not it contains the literal words. Semantic search is particularly powerful in ECA because the team often does not yet know what vocabulary the relevant documents will use.
Auto-clustering and auto-summarization let the team get a fast read on the major themes in a collection without writing a single search term. The system groups documents by topical similarity, generates a plain-language summary of each cluster, and surfaces the most representative documents in each group. The team can drill into clusters that look promising, set aside clusters that are clearly irrelevant, and develop a working theory of the matter in hours rather than weeks.
DecoverAI's Evidence Analysis capabilities are designed exactly for this kind of work, and the Chronology Viewer turns the output of ECA into an interactive timeline of key events, communications, and inflection points. Together they let a litigation team move from "complaint just landed" to "we have a defensible view of the matter" in days rather than months. For a practical look at how this changes the economics, see our piece on the hidden cost of document review — the same forces that make traditional review expensive are exactly what AI-assisted ECA is designed to eliminate. To learn more about how DecoverAI approaches the discipline as a whole, visit our Early Case Assessment page.
Mistake one: starting too late. The single most common ECA failure is treating it as something that happens after preservation, after collection, and after processing — by which point most of the costs that ECA was supposed to control have already been incurred. ECA should begin the day the matter lands. Even a rough first-pass assessment based on custodian interviews and sample data is more valuable than a polished assessment delivered three months in.
Mistake two: collecting everything before assessing anything. The instinct to "preserve everything just in case" is understandable, but it is almost always wrong. Over-collection drives up storage and processing costs, expands the universe of material that must be reviewed, and often surfaces collateral risks that would never have come to light if the collection had been properly scoped. The right approach is to preserve broadly (which is cheap and reversible) but collect narrowly (which is expensive and consequential), then expand collection only when ECA findings justify it.
Mistake three: relying on keywords alone. Keyword search remains an important tool, but it is no longer sufficient on its own. Boolean queries miss documents that use synonyms, acronyms, codenames, or non-English terms; they over-match on common words used in unrelated contexts; and they offer no insight into the conceptual structure of the collection. Modern ECA combines keywords with concept clustering, semantic search, and communication analysis to triangulate on the relevant material from multiple angles.
Mistake four: failing to document the process. ECA decisions — which custodians to include, which date ranges to cover, which search terms to use — will eventually be challenged, either by opposing counsel during meet-and-confer or by the court if the matter becomes contentious. A team that cannot explain how it made its scope decisions, and what data informed them, is in a weak position. Document the inputs, the analyses, the iterations, and the rationale for every significant decision. Defensibility is not a separate workstream; it is a byproduct of doing ECA properly.
Mistake five: treating ECA as a one-time event. The best ECA processes are iterative. As collection expands, as new facts emerge from depositions, and as the case theory evolves, the assessment should be revisited and updated. A case score that was accurate in February may be obsolete by May. Teams that build ECA into their ongoing case management — rather than treating it as a one-time deliverable — consistently make better decisions throughout the life of the matter. The result is fewer surprises, lower costs, and better outcomes for clients.