Unclear on the Concept, Revisited, by Craig Ball, Ball In Your Court
This is the second in a series revisiting Ball in Your Court columns and posts from the primordial past of e-discovery–updating and critiquing in places, and hopefully restarting a few conversations. As always, your comments are gratefully solicited.
Unclear on the Concept
[Originally published in Law Technology News, May 2005]
A colleague buttonholed me at the American Bar Association’s recent TechShow and asked if I’d visit with a company selling concept search software to electronic discovery vendors. Concept searching allows electronic documents to be found based on the ideas they contain instead of particular words. A concept search for “exploding gas tank” should also flag documents that address fuel-fed fires, defective filler tubes and the Ford Pinto. An effective concept search engine “learns” from the data it analyzes and applies its own language intelligence, allowing it to, e.g., recognize misspelled words and explore synonymous keywords.
I said, “Sure,” and was delivered into the hands of an earnest salesperson who explained that she was having trouble persuading courts and litigators that the company’s concept search engine worked. How could they reach them and establish credibility? She extolled the virtues of their better mousetrap, including its ability to catch common errors, like typing “manger” when you mean “manager.”
But when we tested the product against its own 100,000 document demo dataset, it didn’t catch misspelled terms or search for synonyms. It couldn’t tell “manger” from “manager.” Phrases were hopeless. Worse, it didn’t reveal its befuddlement. The program neither solicited clarification of the query nor offered any feedback revealing that it was clueless on the concept. . . .