DecoverAI - How to Redact Documents for Production (And Get It Right)

The Redaction That Isn't

Defensible document redaction for production requires permanently removing content at the data layer, stripping metadata, and verifying the output before the package ships. Drawing a black box over text is not redaction — the underlying content remains fully accessible until it is deleted from the document's content stream. DecoverAI automates every step of this process, from data-layer removal and metadata sanitization to pre-production QC, ensuring that no privileged or sensitive content survives in a produced document.

The Federal Production Remediation case that DecoverAI handled contained one of the most alarming examples of this failure. Not only were redactions applied visually rather than at the data layer, but both the redacted and unredacted versions of documents were produced side by side in the same production package. The redacted TIFF images were included as required, but the corresponding native files — containing the full, unredacted text — were packaged alongside them. Every redaction in the production was effectively meaningless.

Redaction failures fall into three distinct categories, and a defensible production process must address all three. Visual-only redactions are the most common: a black box is placed over text in the PDF annotation layer, but the text remains in the content stream. Metadata leaks occur when document metadata — author names, tracked changes, comments, embedded objects — contains the same information that was redacted from the document body. And packaging errors occur when the production package includes unredacted versions of documents alongside their redacted counterparts.

Each of these failure modes has led to sanctions, malpractice claims, and privilege waiver in reported cases. The good news is that all three are entirely preventable with a disciplined redaction workflow. The steps that follow will walk you through a process that addresses every failure mode and produces redactions that are truly defensible.

Step 1: Apply Redactions at the Data Layer

The fundamental principle of defensible redaction is that the redacted content must be permanently removed from the document, not merely hidden from view. A proper data-layer redaction deletes the text content from the PDF content stream and replaces it with a redaction annotation that cannot be reversed. This is fundamentally different from drawing a black rectangle over text, which leaves the text intact and accessible.

In Adobe Acrobat Pro, the process involves two distinct steps, and confusing them is the source of most redaction failures. First, you mark the content for redaction using the Redact tool. This places a red outline around the selected text, indicating that it has been marked but not yet redacted. At this stage, the text is still fully accessible — the marking is a preview, not a redaction. Second, you must apply the redactions, which permanently removes the marked text from the document content stream and replaces it with a black box. Many users complete the first step but not the second, producing documents with red outlines that offer no protection whatsoever.

After applying redactions, you must flatten the PDF to merge all layers into a single content layer. An unflattened PDF retains the redaction as a separate annotation layer, which means that someone with a PDF editor could potentially remove the annotation and reveal the underlying content. Flattening eliminates this possibility by merging everything into a single, non-editable layer. Most redaction tools offer a "flatten" or "sanitize" option that should be used after every redaction session.

For bulk redaction workflows, manual application in Adobe Acrobat is not practical. eDiscovery platforms like DecoverAI apply redactions at the data layer programmatically, ensuring that every redaction removes the underlying text, strips the content from the PDF stream, and flattens the result — all in a single automated step. This eliminates the two-step confusion that causes most manual redaction failures and ensures consistency across thousands of documents.

Step 2: Handle Metadata and Hidden Content

Redacting the visible text of a document is necessary but not sufficient. Documents contain layers of metadata and hidden content that may contain the same sensitive information you redacted from the body. If you redact a person's name from the text of a document but leave it in the PDF metadata, the author field, or a tracked change, you have not actually protected the information.

PDF metadata includes author, creator, creation date, modification date, keywords, and custom properties. These fields are often populated automatically by the application that created the document and may contain information that should be redacted. For example, a PDF created from a Word document may retain the original author's name, the organization name, and other identifying information in its metadata fields. Use a metadata removal tool to strip all non-essential metadata from redacted documents before production.

Word documents present additional challenges. Tracked changes and comments may contain earlier versions of text that has been redacted in the current version. If you redact a paragraph in a Word document but the tracked changes show the original text, the redaction is ineffective. Accept or reject all tracked changes, delete all comments, and remove all document properties before converting to PDF for production. Excel spreadsheets may contain hidden rows, hidden columns, hidden sheets, and named ranges that contain sensitive data not visible in the default view.

Check for embedded objects in all document types. A PowerPoint presentation may contain an embedded Excel spreadsheet with sensitive data. A Word document may contain an embedded image with metadata including GPS coordinates. An email may contain embedded images or attachments that were not separately processed and redacted. Every embedded object in a redacted document must be individually reviewed and, if necessary, redacted or removed.

Step 3: Verify Redactions Before Production

Verification is the step that separates defensible redactions from redaction failures. No matter how confident you are in your redaction process, every redacted document should be verified before it leaves your control. The verification process should be performed by someone other than the person who applied the redactions, providing a second pair of eyes and reducing the risk of systematic errors going undetected.

The most important verification step is to attempt to select the text behind each redaction. Open the redacted PDF, click on the redacted area, and try to drag-select. If you can highlight text beneath the black box, the redaction has failed and the document must be re-redacted. This simple test catches the majority of redaction failures and should be performed on every redacted document, not just a sample.

Run a full-text search across the redacted document for terms that should have been redacted. If you redacted a person's name, search for that name in the document. If it appears in search results, the redaction was incomplete — the name may appear in metadata, headers, footers, or other locations that were not covered by the visual redaction. Check the document properties panel for metadata that should have been stripped. Author, title, subject, and keywords fields should be empty or contain only non-sensitive information.

For high-stakes redactions, open the document in a text editor (such as Notepad++ or a hex editor) and search for the redacted terms in the raw file content. This is the most thorough verification method, as it bypasses the PDF rendering engine entirely and examines the actual bytes in the file. If the redacted text appears in the raw content, the redaction was not applied at the data layer. As a practical matter, over-sample redacted documents during QC — check a higher percentage of redacted documents than non-redacted documents, because the consequences of a redaction failure are typically more severe than other production errors.

Step 4: Ensure Production Packaging Excludes Unredacted Versions

The final step in a defensible redaction workflow addresses the production package itself. Even if every redaction is technically perfect, the production can still fail if unredacted versions of the same documents are included alongside the redacted versions. This packaging error is more common than most practitioners realize, and it completely defeats the purpose of redaction.

The most common scenario is a production that includes both image files (TIFFs or PDFs with redactions) and native files (the original documents without redactions). If your production protocol calls for image production of redacted documents, ensure that no native file is produced for any document that has been redacted. The load file should reference only the redacted image version, and the native file should not be included in the production directory structure.

Review your load file carefully to confirm that the file paths for redacted documents point only to the redacted versions. Check both the image path fields and the native path fields. If a native path is populated for a redacted document, the unredacted native will be included in the production and the receiving party will have access to the unredacted content. This check should be automated as part of your production QC process.

Before finalizing the production, manually inspect the production package folders to verify that only the intended files are present. Check the natives directory for any files corresponding to redacted documents. Check the images directory to confirm that the redacted versions are present and correct. And perform a final count to ensure that the number of files in the production package matches the number of documents in the load file. Any discrepancy should be investigated before the production is released.

How DecoverAI Ensures Redaction Integrity

DecoverAI applies data-layer redactions by default, eliminating the most common source of redaction failures. When a reviewer marks text for redaction in the platform, the underlying text content is permanently removed from the document content stream. There is no two-step process, no risk of marking without applying, and no possibility of visual-only redactions reaching a production. Every redaction is automatically flattened and every document is automatically stripped of sensitive metadata.

The platform's AI-powered privilege classifiers provide an additional layer of protection by identifying potentially privileged content that may have been missed during manual review. These classifiers flag documents that contain language patterns consistent with attorney-client communications, work product, or other privileged categories, ensuring that privileged content is not produced without redaction. The AI operates as a safety net, not a replacement for human judgment — every flagged document is routed to a senior attorney for review and decision.

In the Federal Production Remediation matter, DecoverAI remediated over 360,000 documents that had been produced with redaction failures by a previous vendor. The remediation included replacing every visual-only redaction with a proper data-layer redaction, stripping metadata that contained redacted content, removing unredacted native files from the production package, and regenerating the load files to reference only the corrected versions. The entire remediation was completed under federal court deadlines.

DecoverAI's production packaging engine automatically excludes unredacted native files for any document that contains redactions, eliminating the packaging error that occurred in the Federal Production Remediation case. The platform generates a pre-production QC report that identifies every redacted document, confirms that redactions were applied at the data layer, and verifies that the production package contains only the redacted versions. This automated verification runs before every production is released, providing a final safety check that catches any errors that may have been introduced during the review and production process.

How to Redact Documents for Production (And Get It Right)