The Agile Data Warehouse: Keeping Users Happy

Though they share a single word, agile data warehousing (DW) is nothing like agile software development.

Agile programming disciplines tend to champion a code-first, document-later ethic. Some agile approaches even eschew traditional documentation altogether. Agile programming techniques tend to place an emphasis on frequent testing: at least one agile discipline, test-driven development (TDD), explicitly prescribes a test-first approach.

In all of their variants, agile approaches emphasize the importance of frequent (and typically interactive) involvement with line-of-business customers. It isn't unusual for agile teams to solicit feedback from customers on a periodic (daily, weekly, or bi-weekly) basis. This lets them incorporate new features as customers demand them -- or change features based on feedback from users.

There are a number of reasons why a straight-up agile approach doesn't translate very well into the data warehousing world, experts say.

There's the important paradigmatic distinction between programming -- with its procedural (or line-by-line) orientation -- and data management (DM), which typically lives and thinks in a set-based world.

There are practical logistical concerns, too. "You have to look at it kind of differently, because it can take you longer to write a test case than it takes us to generate the code for you. Suddenly, you're in a different paradigm. You get to be testing outcomes more than anything, which very much brings you into the data world," says Michael Whitehead, founder and CEO of WhereScape Inc., an upstart maker of a data warehouse-generating tool called RED.

"When you're building warehouses in an agile fashion, you're bringing together the concepts of software development and data, and a lot of the agile software techniques don't flow across to the data world."

A lot of the agile buzz at last month's TDWI World Conference in San Diego concerned agile business intelligence (BI), which, Whitehead respectfully suggests, isn't at all the same thing as agile data warehousing.

"When people talk agile in the data world, they generally talk agile BI. They generally talk about the reports, that sort of layer becoming agile. That's a no-brainer. If it's a distinct point where you have customer interaction, of course you should put something in front of them. It isn't quite so easy with a data warehouse," he argues.

All the same, Whitehead describes himself as a proponent of agile data warehousing, particularly inasmuch as "agility" connotes the acceleration or automation of tedious, onerous, time-consuming, or otherwise costly tasks.

"The trick is really about how far you can take that [agile in a data warehousing context] back. We think that you should be taking that back to the data layer as well," he explains, conceding that just how agile should be extended to the data layer is the 64 petabyte question.

For example, some agile programming disciplines prescribe periodic "refactoring" -- or recoding -- of applications. In a data warehousing context, this would be analogous to rebuilding (or regenerating) the warehouse itself.

Refactoring, in this context, can be accelerated by a tool like RED. In most configurations RED can suggest prescriptive changes (and, if necessary, implement them) to accelerate the refactoring process. Refactoring, Whitehead cautions, probably isn't something you'd want to automate, however.

"Given the potential volumes that you're dealing with, you don't want us automating that [i.e., refactoring]," he says. "The approach we've taken [with RED] is that we'll identify the change and suggest a way [to implement the proposed change]. You can look at it and either say 'Yes' to it, or decide that you want to take a different approach. We're just not willing to take the plunge to automate that given the knowledge our users have about their own environments."

Agility is, of course, synonymous with nimbleness, deftness -- that is, speed.

It's in this respect that a company such as WhereScape makes its most explicitly agile pitch. Rather than exhaustively planning and scoping and documenting a new warehouse's requirements, Whitehead argues, use a tool such as RED to jump right in and start prototyping it on-the-fly. Use RED to generate a new warehouse or data mart to address seasonal, tactical, or one-off needs. (You can also use RED to cross off the kind of "nice-to-have" features that rarely, if ever, get crossed-off.) Use RED to periodically refactor or improve the performance of your warehouse or data mart assets. Use RED, Whitehead argues, or your information consumers are going to use something else.

This, he concludes, is the essence of agile. "If you're a data guy, you need to make sure that you are doing whatever you can to deliver quickly and deliver value and make changes so that your stuff is relevant," he asserts. "If you can't do that, people are going to fill that vacuum."


  • Cybersecurity
    Shutterstock photo id 669226093 By Gorodenkoff

    The disinformation game

    The federal government is poised to bring new tools and strategies to bear in the fight against foreign-backed online disinformation campaigns, but how and when they choose to act could have ramifications on the U.S. political ecosystem.

    sensor network (agsandrew/

    Are agencies really ready for EIS?

    The telecom contract has the potential to reinvent IT infrastructure, but finding the bandwidth to take full advantage could prove difficult.

  • People
    Dave Powner, GAO

    Dave Powner audits the state of federal IT

    The GAO director of information technology issues is leaving government after 16 years. On his way out the door, Dave Powner details how far govtech has come in the past two decades and flags the most critical issues he sees facing federal IT leaders.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.