Keeping data flowing
In the quest to build leaner, more nimble agency systems, attention turns to data
- By John Moore
- Apr 18, 2005
Art and obscenity rank among those things that defy easy definition: they have a know-it-when-I-see-it quality that makes them hard to pigeonhole.
Data may be similarly enigmatic, but government managers are nevertheless attempting to define it. Some agencies have embarked on data architecture projects, which aim to organize data so that it can be more readily shared within and among organizations. Until recently, such sharing has not been a priority for federal agencies that have accumulated data in isolation. The government now faces a daunting data organization effort. A key assignment: Develop common definitions for the types of data agencies possess.
The Environmental Protection Agency and the Interior Department are among the federal entities taking the latest crack at data architecture. More may follow, particularly in light of the Office of Management and Budget's most recent addition to the federal enterprise architecture.
Last October, OMB unveiled the fifth and final element of this architecture, a data reference model intended to promote governmentwide information sharing. The model promises to provide the guidance agencies need to get moving on data architecture.
"I think what [the data reference model] will do is serve as a tool for all of the agencies who are creating data architectures and data models," said Kimberly Nelson, the EPA's chief information officer.
As for data architecture's benefits, easier data sharing generally gets top billing. But the ability to boost government programs' performance may be its keenest edge, according to government and industry executives. Data architectures improve data quality and eliminate costly redundancies, making programs more effective. That advantage is critical for selling top managers on data architecture efforts.
Once an agency gets the green light, the data architecture job shifts to execution. In that regard, guidelines for running a data architecture effort have begun to emerge. The data reference model provides assistance, but data architects also identified lessons culled from project work. For example, experts say including business and technology managers on projects should be a cardinal rule.
Fred Collins, senior enterprise architect in IBM's Global Government Services Division, said attempts to place data architecture under a technology-only umbrella are mistaken. "I think that is a recipe for failure," he said, citing the need to reach out to the business community and its leaders. Collins has worked with Interior on its data architecture.
Transcending the stovepipes
Over the years, government information systems typically have been built, maintained and defended as stand-alone entities, which is likely why officials often describe systems as stovepipes, islands of automation and silos.
"Up to now, most people have been protective of data silos and competing with each other for who has the most complete data silo," said Brand Niemann, co-chairman of the CIO Council's Semantic Interoperability Community of Practice.
"Every business vertical established its own vertical-specific application with a specific data model," Collins added.
Consequently, application developers labeled data elements the fundamental units of data their own way. A data element for an employee's last name, for example, could be labeled in different systems as last name, name or some other variation. Data elements usually were defined for internal use only, said Michael Holt, director of software engineering at integrator STG.
That's not to say that government agencies have never attempted to share data. Patrick McNabb, director of STG's enterprise architecture practice, said agencies have been wrestling with data issues and standardization for at least two decades. He cited the Defense Department's pursuit of data standardization.
"The object is to make sure the information that everyone is talking about is the same and defined correctly and shared," he said. "That goal has been around for a while."
That goal has seen increased visibility in recent years. Among the prime movers: the demand for greater data sharing and reduced data operations costs following the Sept. 11, 2001, terrorist attacks. But as the need to share data models from application to application intensified, agency officials realized they had no common taxonomy to make sharing possible, Collins said.
In the government, some initial attempts to use a common data classification scheme involved the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 11179 standard. The standard offers guidance for naming data elements and provides for a registry of data elements and associated metadata, high-level data used to describe the data elements.
Registries provide information on the types of data an organization has, where it is housed, its source and the format in which it is available. Registries act as catalogs for information stores.
ISO/IEC 11179 proponents argue that consistently documented data is easier to locate, retrieve and share. The standard's charter states, "Precise and unambiguous data element definitions are one of the most critical aspects of ensuring data sharability."
The EPA's Environmental Data Registry, the Federal Aviation Administration's Data Registry and the Australian Institute of Health and Welfare's Knowledgebase are examples of metadata registries built according to ISO/IEC 11179.
Enter the data reference model
Against this backdrop comes the data reference model. According to OMB, "The [data reference model's] primary purpose is to promote the common identification, use and appropriate sharing of data/information across the federal government."
The model builds on ISO/IEC 11179, adapting its approach for describing the structure of data. The data object is the basic element, which is further described by a data property and a data representation. For example, the model document states that "vaccine" would be the data object; the name, weight or potency of the vaccine would be the property; and text, integers or whole numbers would be the representation.
The model's emphasis on data structure will spark much learning among federal officials, Nelson said. "As we model [data structures] in our respective agencies, we will have a better understanding of what opportunities exist for sharing that information," she said.
The data structuring exercise lets agency officials see what data they are collecting and determine what groups inside and outside the agency might be interested in it, Nelson added.
In addition to data structure, the data reference model also focuses on categorization and the exchange format, Nelson said. Categorization places data in a business context and addresses how an agency uses data to support a particular line of business. The exchange format, she said, covers how pieces of information are grouped and shared.
According to some observers, the data reference model is the federal enterprise architecture's central model and the hardest to make happen.
"I still think of data as the basic object for which all these other models are arrayed," said Dan Twomey, recently appointed chairman of the Industry Advisory Council's Enterprise Architecture Shared Interest Group. "It's the coin of the realm. Applications only work if they've got data to work on."
Data architecture, modeling and analysis lie "at the heart of the enterprise architecture," said Michael Tiemann, manager of the enterprise architecture practice at AT&T Government Solutions. Architecture can't be effectively implemented without a significant effort put into data architecture, he added.
To that end, the data reference model provides a structure through which agencies can put their data houses in order. In theory, the model will impose a consistent, governmentwide method for organizing data. How it will work in practice is another story, some executives say.
"It remains to be seen how valuable the reference model is," said Michael Beckley, co-founder and executive vice president of product strategy at Appian. "The model doesn't solve the basic, hard challenge to get people to agree on specific [Extensible Markup Language] schemas and where to register them. [The model] is a design pattern, but people still need to build in that pattern."
Other observers say the data reference model remains conceptual and difficult for agency managers to understand.
Greater definition could be on the way, however. OMB has tapped Michael Daconta, metadata program manager at the Homeland Security Department, to advance the model.
Daconta heads a working group that will revise and complete the data reference model's five volumes. Last month, the group agreed on a strategy for producing the volumes. The first provides an overview and introduction to the data reference model, while the second will offer a management strategy. The other three will correspond to the model's data description, exchange and context layers.
The management strategy volume will include a section on governance, Daconta said. Industry executives say governance is an important issue, noting that incentives are needed to foster interagency data sharing and collaboration. In the past, OMB has linked funding to the achievement of various technology directives.
"OMB is still reviewing the governance process," said an agency official who requested anonymity because of OMB policies. "It is the goal of the CIO Council and [OMB's] E-Government and Information Technology Office to ensure proper and efficient uptake by the agencies. No particular types of incentives have been decided at this point."
Daconta, however, said the need to improve information sharing will encourage agencies to adopt the data model.
In the absence of a financial stick, agencies can still find motivation to pursue data architecture. Benefits include the often-cited improvement in data sharing. In addition, agencies eyeing service-oriented architectures will find the migration easier if they first obtain a solid understanding of their data, government and industry executives say. An architecture's data discovery aspect helps an organization determine what types of services reusable software components can be developed and made available for others.
But to get managers' attention, data architecture's potential to improve mission effectiveness may be the prime selling point.
"I've had the best success building the case [for data architecture] on the impact to the organization," said Michael Brackett, a former Washington state IT executive and now a data architecture consultant. "Selling it as something nice to do is not going to fly."
Moore is a freelance writer based in Syracuse, N.Y.