Federal Court Approves The Use Of “Predictive Coding” Technology-Assisted Document Review

2012 continues to deliver seminal decisions from New York courts on issues of first impression relating to electronic discovery. The latest landmark decision is from Southern District of New York Magistrate Judge Andrew J. Peck in Da Silva Moore v. Publicis Groupe, _ F. Supp. 2d _, No. 11-civ-1279 (ALC) (AJP), 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012). In the Da Silva Moore decision, for the very first time, a court has considered, and approved, the use of advanced document review software in place of more common approaches such as keyword searches or linear review by human reviewers. Predictive coding and technology-assisted review have been hot topics in the e-discovery field for the past few years due to their promise of dramatically lowering discovery costs, but adoption of new review techniques by litigants has been guarded due to the absence of any guidance from any court. This is the first time that any court has expressly indicated that its use is appropriate, and the decision opens the door to increased use of the technique, provided that suitable workflow and quality controls are put into place.

What Is “Predictive Coding”?

“Predictive coding” is a computerized process that uses “sophisticated algorithms to enable the computer to determine relevance, based on interaction with (i.e., training by) a human reviewer.” Id. at *2 (internal citations omitted). Typically, the software is trained by a senior attorney or partner who reviews and codes a relatively small “seed set” of documents for responsiveness. Id. “The computer identifies properties of those documents that it [then] uses to code other documents” until the “system’s predictions and the reviewer’s coding sufficiently coincide,” at which point “the system has learned enough to make confident predictions for the remaining documents.” Id. Typically, this allows a set of hundreds of thousands of documents (or more) to be coded for responsiveness and potential production even though only a few thousand have actually been examined by attorneys. In a common implementation of the technique, documents coded “nonresponsive” by the software – typically the bulk of any document collection –  may never be examined again (other than quality-control sampling), while those coded by the software as “responsive” may be reviewed by attorneys for a final responsiveness determination as well as for privilege.

Predictive Coding Issues Implicated In Da Silva Moore

In Da Silva Moore, a putative employment discrimination class action, the parties had initially agreed to defendant MSL’s use of predictive coding software but disagreed over the scope and implementation of the software. Id. at *1 n.1. An identification of an initial custodian set and of ESI (electronically stored information) locations had yielded a huge data set of over three million emails requiring filtering and review. Id. at *3.

The dispute arose over plaintiffs’ concern with the lack of transparency, specifically as to what documents MSL would code as nonresponsive versus responsive and thus the accuracy of the predictive coding software’s training. Id. at *11. Plaintiffs also expressed concern over the size of the dataset and at what point a determination could be made that the software was properly and fully trained. Id. Finally, the plaintiffs disagreed with MSL’s proposal that “after the computer was fully trained and the results generated, [it would] only review and produce the top 40,000.” Id. at *3.

Predictive Coding Protocol Approved

Magistrate Peck deemed predictive coding appropriate in Da Silva Moore because the following factors had been established:

(1) the parties’ agreement; (2) the vast amount of ESI to be reviewed … ; (3) the superiority of computer-assisted review to the available alternatives … ; (4) the need for cost effectiveness and proportionality under [Federal Rule of Civil Procedure (the “Rules”)] 26(b)(2)(C), and (5) the transparent process proposed by MSL.

Id. at 11. He opined that in this situation – where three million documents had drawn back for review – “[l]inear manual review is simply too expensive.” Id. at *9. He also questioned the use of “keywords” alone, which he believed often drew back “large numbers of irrelevant documents,” but suggested keywords were useful when incorporated with a computer-assisted review tool. Id. at 10-11.

In the decision, Magistrate Peck endorsed a protocol for sampling approximately 7,000 emails from the collection and allowing computer software to make responsiveness determinations based on attorney coding of these seed sets. Id. at *11-12. Magistrate Peck agreed with the plaintiffs that a hard-number cut-off was an inappropriate use of the predictive coding software. Id. at *3.

[W]here the line will be drawn as to review and production is going to depend on what the statistics show for the results, since proportionality requires consideration of results as well as costs. And if stopping at 40,000 is going to leave a tremendous number of likely highly responsive documents unproduced, MSL’s proposed cutoff doesn’t work.

Id. (internal quotation marks and citations omitted). The Court noted that judicial approval of a review and production stopping point was unlikely “until the computer-assisted review software has been trained and the results are quality control verified.” Id. at *12. He also suggested that use of the tool in stages might be appropriate and that it might also further control costs. “[S]taging of discovery by starting with the most likely to be relevant sources (including custodians), without prejudice to the requesting party seeking more after conclusion of that first stage review, is a way to control discovery costs.” Id. at *12.

Magistrate Peck also held that the defendants, if using predictive coding software, must provide their seed set – “including the seed documents marked as nonresponsive to the plaintiff’s counsel” – for their review and input on the training of the software. Id. at *3. The detailed protocol to be used by the parties was annexed to the decision and includes many iterative steps and opportunities to fine-tune the training of the software. The Da Silva Moore decision thus reflects a cooperative and interactive approach to using these new technologies.

Potential Applicability To Other E-Discovery Projects

It is worth noting that Magistrate Peck has in recent times been a vocal supporter of the use of predictive coding software. In October 2011, he published a widely read article discussing predictive coding and its benefits. See Andrew Peck, "Search Forward," L. Tech. News, Oct. 2011 (“Search Forward”). In that article, Magistrate Peck expressed the opinion that “computer-assisted coding should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ determination of cases in our e-discovery world.” Da Silva Moore, at *1 (citing Search Forward & Rule 1). In the past year, he has also articulated the same view as a speaker at various e-discovery conferences. Magistrate Peck suggested that a judicial opinion approving the use of predictive coding as “a proper and acceptable means of conducting searches” under the Rules might be a long time in coming. Id. Yet, issued only four months after his article, Magistrate Peck’s decision in Da Silva Moore does just that. In it, he suggests that “[c]omputer-assisted review appears to be better than the available alternatives and thus should be used in appropriate cases.” Id. at *11. Still, Magistrate Peck cautioned that “computer-assisted review is not perfect” – but noted that the Rules “do not require perfection.” Id. “[T]he idea is not to make this perfect, it’s not going to be perfect. The idea is to make it significantly better than the alternatives without nearly as much cost.” Id. at *6. He recognized that predictive coding “is not a magic … solution appropriate for all cases” but endorsed its use “where appropriate.” Id. at *8.

The Da Silva Moore decision appears to reflect Judge Peck’s desire, previously set out in his article, to provide litigants with precedential authority for the use of technology-assisted review. In its conclusion, the Court wrote:

What the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review. Counsel no longer have to worry about being the “first” or “guinea pig” for judicial acceptance of computer-assisted review. As with keywords or any other technological solution to ediscovery, counsel must design an appropriate process, including use of available technology, with appropriate quality control testing, to review and produce relevant ESI while adhering to Rule 1 and Rule 26(b)(2)(C) proportionality. Computer-assisted review now can be considered judicially approved for use in appropriate cases.

Id. at 12.

Other courts might differ in their approach.


Predictive coding may be an appropriate approach for certain document review projects and may, if used properly, dramatically reduce the cost of document review. The Da Silva Moore decision suggests that the implementation of an iterative protocol with appropriate quality assurance may be considered an important aspect of utilizing these new approaches. Therefore, if a litigant is considering the use of these new technologies, consultation with e-discovery counsel to establish a predictive coding review protocol is advisable. These new technologies may also be useful for the review of incoming document productions, internal investigation materials, risk assessments, or any other project that involves the review of large document sets, especially when there are challenging time or cost constraints. 

Status and Options
Other Topics: 
Web Topics: