Digging digital gold
- By William Matthews
- Feb 06, 2000
From the Internal Revenue Service, which collects money, to the Defense
Department, which spends a lot of it, government agencies are turning to
an advanced form of computer analysis called data mining to uncover fraud,
keep better track of supplies and improve budget forecasting.
Adopting the same techniques that private-sector marketers have developed
to track consumer spending and predict what customers will buy, agencies
are using computers to sift through vast amounts of data to uncover hidden
patterns that might indicate where fraud or inefficiency is occurring.
Eventually, data mining experts say, the technique may be used for purposes
such as improving aircraft safety, producing better drugs and securing the
Mining for Fraud
The Defense Finance and Accounting Service, which pays billions of dollars
worth of military bills each year, is a leader in the data mining field.
DFAS is testing data mining as a way to discover billing errors and fraud.
In a test that began in November and continues to July, DFAS' vendor
pay branch uses data mining to search through 2.5 million financial transactions
that may indicate inaccurate charges. Computers use data mining software
to screen each transaction for 80 different elements, from what was bought
and at what price to how it compares with previous purchases.
Although the test isn't completed, the effort so far has pointed out
several hundred bills that may warrant further investigation, said vendor
pay branch chief David Riney.
An earlier DFAS data mining test focused on government purchase cards,
which government employees use to buy airline tickets, rent cars, and pay
for hotel bills and meals. In some agencies, employees use the cards to
make 80 percent to 90 percent of office supply purchases that are less than
The problem is how to pick fraudulent transactions out of the millions
transactions that DOD processes each year. "In the past, we have relied
on tipsters" to point out fraud, Riney said.
Using SPSS Inc.'s Clementine software, the agency
searched 125,000 transactions made on 40,000 purchase card accounts. In
addition to examining the obvious — payment amount, the date and time the
purchase was made and the type of vendor — computers delve into cardholder
information, account transaction limits, billing cycles and purchase histories.
As the computer searches the transactions, data patterns that might
indicate improper use emerge — such as purchases made on weekends and holidays,
entertainment expenses, unusually frequent purchases, multiple purchases
from a single vendor and other transactions that do not correspond to the
agency's past purchasing patterns.
In its data mining test, DFAS turned up a cluster of 345 cardholders
who had made suspicious purchases, some of whom are still under investigation.
But the process needs some fine-tuning, Riney said. For example, purchases
of golf equipment seemed suspicious until investigators learned that a military
recreation manager had authority to buy the equipment. And expenses the
computer said were charged to a "casino" turned out to be an ordinary hotel
Nevertheless, the data mining results have been promising enough that
Riney predicted data mining will become a regular part of DFAS efforts to
Indeed, numerous agencies have begun to pick through databases for information
that can improve agency operations. "Running the business of government
is where we see the growth in data mining," said Mark Battaglia, executive
vice president for marketing at SPSS Inc., the nation's largest data mining
The Army has used data mining to try to identify sources of delay in
its "order and ship" process of delivering supplies to overseas bases. NASA
has considered using predictive data mining to search aircraft maintenance
and mishap data for factors that might predict accidents.
Also, the Federal Aviation Administration has hired Mitre Corp. to find
ways it can mine aircraft accident data for clues about their causes and
how those clues could help prevent future crashes. Already, Mitre has found
that planes equipped with instrument displays that can be read without requiring
a pilot to look away from the windshield were damaged less in runway accidents
than planes without them.
But the government is cautious about committing much money to data mining.
"One of the problems is how do you prove that you kept the plane from falling
out of the sky," said Trish Carbone, a technology manager at Mitre.
Data mining also can be used to improve computer security, said Kristin
Nauta, data mining program manager at SAS in Cary, N.C. Mining network logs
could uncover patterns of intrusion that system operators could not detect
in other ways. Data mining could also point out holes in computer security
systems that let intruders enter, she said.
For the IRS, data mining is a way to improve customer service, said
IRS data mining specialist Ester Brook-Jones. By analyzing incoming requests
for help and information, the IRS hopes to be able to schedule its work
force better to provide faster, more accurate answers to taxpayers' questions.
For the past year, the Department of Veterans Affairs has been using
data mining to predict demographic changes among its 3.6 million patients
and project collections from insurance companies. The technology enables
the VA to send Congress more accurate budget requests, said Robert Hinson,
the VA's director of communications and special studies services.
Agencies such as the VA, which spends about $19 billion annually to
provide medical care to veterans, are under increasing pressure to show
that they are operating efficiently. For many, data mining is becoming the
tool of choice to highlight good performance or dig out waste.
"Data mining is excellent at detecting patterns where things might not
be working right," Battaglia said, whether it is multiple Social Security
checks going to different names at the same address, or an unusual pattern
of Medicare billings by a doctor.
The potential for savings through data mining is enormous, said Herb
Edelstein, president of Two Crows Corp., a Potomac, Md., data mining company.
Consider government pensioners. Including Social Security recipients,
retired military personnel and retired government workers, about 10 million
Americans receive government pension payments. It is not uncommon when they
die for their pension payments to continue. Even if the system were 99.9
percent accurate at stopping checks for those who die, 10,000 payments would
still be mailed to deceased recipients, Edelstein said.
By using data mining to analyze data the government already has on pension
recipients — age, health and other factors — it is possible to determine
those who are most likely to have died. Pension administrators then know
which recipients to check to ensure they are still alive. If the average
pensioner receives $10,000 per year, eliminating payments to retirees who
have died could save the government $100 million per year.
Although some agencies are veteran data miners — such as the National
Institutes of Health, which drills into databases to learn how well medical
treatments work — other agencies have just begun to sift through the mounds
of stored data.
The Navy, for example, recently established a data warehouse to manage
the distribution of torpedoes throughout the Pacific submarine fleet, said
Rear Adm. Charles Munns, the U.S. Pacific Fleet's deputy chief of staff
for command and control and requirements and resources.
"We have to make sure the right torpedo gets to the right ship, and
now we know at the command level where all those torpedoes are," he said.
The Navy also uses data mining to better manage logistics. Using an
Oracle Corp. database system to keep track of parts and spares enabled the
command to forego its annual 12-person visits to submarines for a hands-on
Steve Petchon, a partner at Andersen Consulting who works with federal
agencies, said military logistics operations are a logical application for
data mining. In the private sector, companies such as UPS and Federal Express
rely on data mining to keep track of the goods they ship. But to succeed
at data mining, Petchon said, the military will have to replace its stovepiped
data systems so that data can be collected in a repository where multiple
users can get to it easily.
Other possible uses for data mining by the federal government might
include predicting which job candidates are most likely to succeed if hired,
and which government workers seeking security clearances are least likely
to commit security violations, Edelstein said.
But proposing those sort of uses for data mining triggers alarms among
privacy advocates, who warn that data mining poses a serious threat to privacy.
It is not fair to rely on statistics to predict who will or won't be
a successful employee, said Andrew Shen, a policy analyst at the Center
for Democracy and Technology. "This sort of technology implies a cookie
cutter mold that has a lot of flaws. What happens if there's a mistake in
To Shen and others, the increasing ability of individuals, government
agencies and businesses to tap databases to compile extensive collections
of personal information raises the peril of compromising personal privacy.
"Until now, we have been able to have privacy through obscurity," said
Beth Givens, director of the Privacy Rights Clearinghouse in San Diego.
Personal data has long been collected and maintained as public records in
file cabinets in courthouses, tax offices, city halls and federal office
buildings. It wasn't easy to find and it was harder to aggregate.
That information is now easily accessible via the Internet or in databases.
Records of divorces, bankruptcies and property ownership and applications
for professional licenses are being stored in digital format. By compiling
them, it is often possible to construct a detailed profile of an individual.
Combine that with the data amassed by credit card companies, catalogue sales
outlets, phone companies, banks, Internet companies and even supermarkets,
and a vast dossier of personal details can be compiled about most people.
That is valuable information to marketers peddling goods and services
ranging from insurance to vacation properties. Such data also may be valuable
to law enforcement agencies whose previous information collecting was limited
by the need for warrants, Givens said.
"If we feel uncomfortable about that, we should have a public discussion
about the information being collected and who it gets access to it," said
Givens, whose clearinghouse crusades for protecting privacy.
The trend is in the other direction, however.
The use of data mining to delve deeply into databases is expected to
increase at a rate of 50 percent to 60 percent per year.
In government, the push to cut costs by reducing staffs will only encourage
more data mining because the technology enables fewer people to manage data
better, Battaglia said.