The Agile Data Warehouse: Keeping Users Happy
- By Stephen Swoyer
- Sep 15, 2010
Though they share a single word, agile data warehousing (DW) is nothing like agile software development.
Agile programming disciplines tend to champion a code-first, document-later ethic. Some agile approaches even eschew traditional documentation altogether. Agile programming techniques tend to place an emphasis on frequent testing: at least one agile discipline, test-driven development (TDD), explicitly prescribes a test-first approach.
In all of their variants, agile approaches emphasize the importance of frequent (and typically interactive) involvement with line-of-business customers. It isn't unusual for agile teams to solicit feedback from customers on a periodic (daily, weekly, or bi-weekly) basis. This lets them incorporate new features as customers demand them -- or change features based on feedback from users.
There are a number of reasons why a straight-up agile approach doesn't translate very well into the data warehousing world, experts say.
There's the important paradigmatic distinction between programming -- with its procedural (or line-by-line) orientation -- and data management (DM), which typically lives and thinks in a set-based world.
There are practical logistical concerns, too. "You have to look at it kind of differently, because it can take you longer to write a test case than it takes us to generate the code for you. Suddenly, you're in a different paradigm. You get to be testing outcomes more than anything, which very much brings you into the data world," says Michael Whitehead, founder and CEO of WhereScape Inc., an upstart maker of a data warehouse-generating tool called RED.
"When you're building warehouses in an agile fashion, you're bringing together the concepts of software development and data, and a lot of the agile software techniques don't flow across to the data world."
A lot of the agile buzz at last month's TDWI World Conference in San Diego concerned agile business intelligence (BI), which, Whitehead respectfully suggests, isn't at all the same thing as agile data warehousing.
"When people talk agile in the data world, they generally talk agile BI. They generally talk about the reports, that sort of layer becoming agile. That's a no-brainer. If it's a distinct point where you have customer interaction, of course you should put something in front of them. It isn't quite so easy with a data warehouse," he argues.
All the same, Whitehead describes himself as a proponent of agile data warehousing, particularly inasmuch as "agility" connotes the acceleration or automation of tedious, onerous, time-consuming, or otherwise costly tasks.
"The trick is really about how far you can take that [agile in a data warehousing context] back. We think that you should be taking that back to the data layer as well," he explains, conceding that just how agile should be extended to the data layer is the 64 petabyte question.
For example, some agile programming disciplines prescribe periodic "refactoring" -- or recoding -- of applications. In a data warehousing context, this would be analogous to rebuilding (or regenerating) the warehouse itself.
Refactoring, in this context, can be accelerated by a tool like RED. In most configurations RED can suggest prescriptive changes (and, if necessary, implement them) to accelerate the refactoring process. Refactoring, Whitehead cautions, probably isn't something you'd want to automate, however.
"Given the potential volumes that you're dealing with, you don't want us automating that [i.e., refactoring]," he says. "The approach we've taken [with RED] is that we'll identify the change and suggest a way [to implement the proposed change]. You can look at it and either say 'Yes' to it, or decide that you want to take a different approach. We're just not willing to take the plunge to automate that given the knowledge our users have about their own environments."
Agility is, of course, synonymous with nimbleness, deftness -- that is, speed.
It's in this respect that a company such as WhereScape makes its most explicitly agile pitch. Rather than exhaustively planning and scoping and documenting a new warehouse's requirements, Whitehead argues, use a tool such as RED to jump right in and start prototyping it on-the-fly. Use RED to generate a new warehouse or data mart to address seasonal, tactical, or one-off needs. (You can also use RED to cross off the kind of "nice-to-have" features that rarely, if ever, get crossed-off.) Use RED to periodically refactor or improve the performance of your warehouse or data mart assets. Use RED, Whitehead argues, or your information consumers are going to use something else.
This, he concludes, is the essence of agile. "If you're a data guy, you need to make sure that you are doing whatever you can to deliver quickly and deliver value and make changes so that your stuff is relevant," he asserts. "If you can't do that, people are going to fill that vacuum."