Fed banking on new data sources

Siloed data could provide a near real-time impression of the health of the economy, says Federal Reserve Board CDO Micheline Casey.

Micheline Casey

Federal Reserve Board CDO Micheline Casey is building a data organization inside the Fed budgeted at about $12 million per year.

The quietest whispers from the Federal Reserve Board can move markets worldwide, so the data the central bank of the United States uses to model its economic projections has to set the standard for reliability and credibility. The Fed also releases its own set of economic indicators, based on data it collects from financial institutions, and established financial information suppliers such as Bloomberg and Reuters.

But the Fed is also beginning to reckon with the explosion of commercial, financial, consumer and other data that has been taking place over the last decade. While there are risks, practical problems and institutional obstacles to incorporating this range of data into the Fed's economic modeling, there are opportunities as well, according to Micheline Casey, the Fed's chief data officer.

Casey has been on the job for almost two years, and she has been building a data organization inside the Fed budgeted at about $12 million per year, employing about 40 people and growing. She's starting to think about how the oceans of data generated by e-commerce firms, real estate transaction sites, crowd-sourced gas price tools, and even sensor data from roads and mass transit systems that are part of "smart city" systems can help provide Fed policymakers and economists with reliable information.

For example, data from Amazon.com and Walmart could supply up-to-date consumer price information; real estate bidding and closing information from Trulia and Zillow could send signals about slack or pent-up demand in the market; data on 3D printing could augur shifts in manufacturing;  data on the use of emerging companies in the sharing economy like Uber and AirBNB could point to changes in car ownership levels or hotel occupancy.

Collectively, these and other economic signals, buried in disparate silos of proprietary data, could provide a "FitBand for the economy," giving a near real-time impression of the macroeconomic health of the U.S. and the world, Casey said.

"We're always looking to improve our forecasting and understand what's really going on in the economy, and what will happen tomorrow," Casey said in a March 31 keynote at the annual Enterprise Data World conference. "What we've been trying to do over the past several years, as the explosion of data has become much more prevalent, is to move forward and stop driving by looking in the rear-view mirror and start identifying what would be new datasets that could help us predict the future of the economy in near-real time."

Structuring the data

The Fed is in the very early stages of looking at what kinds of data could support a more up-to-date look at the economy.

To put the Fed's needs in context, it is a policymaking organization, not an operational or transactional operation, and doesn't need the volumes or velocity of data that Google or Amazon or a stock exchange might require. But the Fed does have "highly complex data needs that span structured, unstructured, and semi-structured" data, Casey said. "Just having lots more data isn't necessarily helpful," she said.

Casey is looking at sources of high-frequency data, which could speed up publication and reduce the need for revisions of economic measurements, and more granular and geographically targeted data, that could help give policymakers a focus on how the economy is performing in particular cities and regions.

But managing this data poses some challenges. First, the sources of a lot of this e-commerce generated data have been around only for a decade or so, making historical comparisons problematic. There is a selection bias in terms of who is using online products and services. From a data stewardship standpoint, there's no guarantee that data collected today will be maintained in that format five years -- or even five quarters -- in the future.

To meet those challenges, the Office of the Chief Data Officer has introduced new data management roles, including a dedicated team of data governance analysts and data architects.

"We're taking a holistic, enterprise view of how we do the work we do. But particularly as we deal with some of these newer data sets, we can't just throw them into production," Casey said. "We have to figure out how we manage these, what sorts of stewardship would be needed for newly emerging data sets. We're not sure, because some of these data products are so very new, and they're not that stable yet."

There are also technical and infrastructure obstacles to managing new data inventories. "What is big data today will be small data in five years," Casey said. "This is as small as the data will ever be again, so we have to start adjusting now."