Dirty data is no longer a little secret
Kentucky’s large-scale approach should make information more reliable and easier to share
Dirty data is an unpleasant problem governmentwide, the result of years of well-intentioned but piecemeal automation efforts. Yet few officials are willing to commit to the unglamorous job of cleaning it up.
Now technology executives in Kentucky are putting the scrub brush to their data, hoping that their efforts will lead to short- and long-term benefits for the states agencies and residents.
At the most basic level, redundant and conflicting information leads to multiple records with, for example, one agency listing constituents by last name then first name and another department reversing the order. Factor in shortened versions Rob
for Robert, for example abbreviations and typos, and there are ample opportunities for inaccuracies that waste employees time and lead to poor customer service.
If you just go to them and say, Ive got this wonderful star schema, people will literally run for the hills. John Daly, Keane
In addition to those day-to-day problems, inconsistent data can also undermine longer-term plans to promote cross-departmental information sharing and build new service-oriented architectures that rely on mixing and matching data and applications.
The states answer is the Kentucky Enterprise Data Architecture (KEDA). Despite its technical-sounding name, it is designed to benefit department heads and program managers by reducing the cost of deploying new applications.
Kentucky officials hope to reuse information across multiple departments and applications, a strategy that could make more money available for information technology projects, said Mark Rutledge, commissioner of the Commonwealth Office of Technology.
If all goes as planned, KEDA will induce business managers to say, This data project is saving me money, and by the way, its shortened the development time so Ill realize the benefits of a new project sooner than we otherwise would, Rutledge said.Information assets
Kentucky officials expect more reliable data will help them save time and money. KEDA should enable nine Cabinet agencies and other organizations in the executive branch to share information more effectively.
In the past, departments stored data in a variety of formats and systems, which suited the needs of individual organizations but made cross-agency sharing difficult, Rutledge said.
By contrast, KEDA recognizes data as a state asset, said Neil Downing, director of enterprise information management at IT consultant Keane. Kentucky hired the company to help with the project.
The states efforts mirror similar attempts at federal agencies to make data sharing easier and more reliable. For example, at the end of August, the federal Office of the Program Manager for the Information Sharing Environment issued the first version of an enterprise architecture framework designed to help federal agencies share and search terrorism information across jurisdictional boundaries.
KEDAs first phase, now under way, entails an inventory of all data repositories, which a task force of IT staffers, departmental representatives and Keane employees are conducting. That effort will identify duplicate data caused by inconsistent formatting.
Within multiple tax codes in the Department of Revenue, I may be Rutledge comma Mark, Rutledge Mark A. or Rutledge A. Mark, depending on the strategy that was used to develop that application, Rutledge said. Theres not a common data architecture.
Besides leading to redundancies, inconsistent formats make it difficult for policy-makers to gather complete data for decision-making. For example, as Kentucky works to revise its energy policies, program managers must painstakingly find all the databases at multiple agencies that contain information about the coal industry. Some of that data might reside in Microsoft Access databases, while other records are stored in other relational databases.
It re-energizes people if we can say, We accomplished this, and our next target is here.
Mark Rutledge, Kentuckys Commonwealth Office of Technology
We have to make sure that [the multiple data sources] are not duplicated, that they are validated and then map all of them to get a composite view, Rutledge said. Its challenging and its time-consuming.Common formats
Kentuckys plan for organizing data, known as a common data framework, will point staff members to sources of record the data sources that offer the most accurate and up-to-date information.
When the framework is in place, you can draw information from that data and not have to spend all that time and energy trying to map it, massage it or rework it so that its usable for the user, Rutledge said. Wherever data is, he added, we want to be able to use it. Thats the key objective.
Kentucky enlisted Keane to help the IT department and the interagency task force develop data formats for use statewide.
The interagency task force is also working to create guidelines for granting access rights and permissions, while another task force is devising an identity management system. Additionally, Kentuckys Department of Revenue, the Office of the Governor, and the cabinets for Personnel, Transportation and Economic Development are working on how they will share their respective information.
The role for IT is to ask, What is it that you need and what are some of the risks or concerns that you have so that we can meet the needs and minimize the risks? Rutledge said.
After the initial phase is completed in about six months, the KEDA team will develop the architectural details of the framework, including formatting standards. It will also develop long-term application plans including a service-oriented architecture that will build on the data foundation.Stumbling blocks
Success in each phase of the effort will hinge on the willingness of people outside the IT department to see the value of what looks like an arcane technology problem. If you just go to them and say, Ive got this wonderful star schema [for warehousing data], people will literally run for the hills, said John Daly, senior vice president and chief innovation officer at Keane.
A better approach is to emphasize the business benefits of KEDA for department leaders, he added.
You come in and say, Your business process is clearly broken because it is pointing at these four systems, and its costing you X a year. But if we built it this way, you can see how much easier that would be for you, Daly said. Thats how you build consensus and get people involved.
To promote those kinds of discussions, Keane is organizing workshops with individual departments to determine their business issues. That information will help officials tailor the final framework design.
To provide long-term value through this framework, its going to align with their business goals, Daly said. We also tell them, Oh, by the way, because we are architecting with a Web services tie, we can quickly move to other information-delivery channels, such as a BlackBerry.
Rutledge said hes pleased with the level of support KEDA is receiving from department managers, but he doesnt want to become complacent. For any project, the newness is exciting, he said. But once you create that energy, you have to sustain it.
One way to do that is to pick projects likely to see quick results. It re-energizes people if we can say, We accomplished this, and our next target is here, Rutledge said. Thats how you keep the energy going forward.
Kentucky officials want to add business intelligence capabilities to KEDA, which could be a way to demonstrate quick results. Such technology has typically fed historical data to complex analytical systems so experts could identify trends and create performance reports. Newer technologies are more democratic. They emphasize funneling information to people to help them do their jobs more efficiently. Delivery tools include Web portals with summaries of key data important to each job function.
A data architecture framework will mean that, along with real-time business intelligence, we can gain the information we need to make strategic decisions, Rutledge said.Joch is a business and technology writer based in New England. He can be reached at email@example.com.