Digging in the Dark

Smart Business, April 2002
by Thomas Claburn

There are companies out there that could tell you how to make a lot of money with data mining. Then they'd have to kill you.

OK, we're exaggerating. But only a little.

"Is this an actual phenomenon? Yes," says Frank Gillett, a Forrester Research senior analyst, adding that companies that "find something with [data mining] tend to want to keep really quiet."

The obvious reason is that any technology has greater value if your competitors don't have it. "Because if you get into an arms race where your competitors and you are both doing data mining, then the value is diminished," Gillett says. "One of the results of data mining can be a predictive model. And that model is good as long as the original conditions in which it was created don't change. But if people are changing behavior to adjust to what you're doing, then the model decays."

That's precisely the reason that Tom Rhoton, former director of product marketing at WhizBang Labs, couldn't disclose the name of the Global 100 telecommunications hardware provider that used his company's product to automatically eliminate discrepancies and redundancies in 800,000 Web pages prior to relaunching its Web site—a task that would have been impossible without WhizBang's technology. The company's product scours the Web and extracts the specific data you're looking for from myriad, unstructured sources.

Another WhizBang customer that insists on anonymity is a national online business directory. Thanks to WhizBang's unique data-extraction capabilities, the directory can help customers in, say, Philadelphia find out when certain government offices or utilities are open, what neighborhood restaurants are serving the freshest fish, or where the closest theater is located—and what movies are showing there. "Information that granular on any kind of a scale at all would be next to impossible for you to maintain and update without an approach like ours because it would be just ridiculously labor intensive," says Rhoton.

WhizBang Labs counts Dun & Bradstreet, FlipDog.com, and the U.S. Department of Labor among the clients that it's allowed to name. FlipDog.com was actually created to showcase WhizBang's technology, which is used on the site to compile up-to-date job listings culled from corporate postings across the Web. FlipDog worked so well that No. 1 job site Monster.com bought it rather than try to compete.

Business information provider Dun & Bradstreet hired WhizBang to create a data product that it could resell. WhizBang scoured the Web for information about the most e-commerce–enabled businesses and put it into a database. "The day we signed this agreement with Dun & Bradstreet," Rhoton says, "they made an announcement to Wall Street that they would sell this data set for between $7 million and $10 million the very first year." (Neither Dun & Bradstreet nor current executives at WhizBang would confirm this number; Rhoton is no longer with the company.)

Rhoton couldn't disclose how much D&B paid, but says WhizBang's fees depend on how much data you want, how many sites you want to search, and how often you need to do it—typically ranging from $30,000 to $1 million or more.

Secret Weapons: Data Mining Tools

Dozens of companies can help you sort through data to find the exact information you need, depending, of course, on what you're looking to find.

Autonomy
Organizes unstructured data from e-mail, the Web, and other sources, making it easier to classify, store, and analyze.

ClearForest
Reads vast amounts of text, extracts the relevant information, and provides visual, interactive executive summaries.

Ingenix
Provides health-care information solutions, including actuarial consulting and data analytics.

Lingo Motors
Uses technology dubbed Ordinary Language, which lets search engines understand queries structured the way people talk.

Semio
Categorizes information to make searches fast and easy.

WhizBang Labs
Creates information-extraction technologies that automatically find, classify, extract, and compile data from a variety of sources.