The Agile Data Warehouse: Keeping Users Happy

Though they share a single word, agile data warehousing (DW) is nothing like agile software development.

Agile programming disciplines tend to champion a code-first, document-later ethic. Some agile approaches even eschew traditional documentation altogether. Agile programming techniques tend to place an emphasis on frequent testing: at least one agile discipline, test-driven development (TDD), explicitly prescribes a test-first approach.

In all of their variants, agile approaches emphasize the importance of frequent (and typically interactive) involvement with line-of-business customers. It isn't unusual for agile teams to solicit feedback from customers on a periodic (daily, weekly, or bi-weekly) basis. This lets them incorporate new features as customers demand them -- or change features based on feedback from users.

There are a number of reasons why a straight-up agile approach doesn't translate very well into the data warehousing world, experts say.

There's the important paradigmatic distinction between programming -- with its procedural (or line-by-line) orientation -- and data management (DM), which typically lives and thinks in a set-based world.

There are practical logistical concerns, too. "You have to look at it kind of differently, because it can take you longer to write a test case than it takes us to generate the code for you. Suddenly, you're in a different paradigm. You get to be testing outcomes more than anything, which very much brings you into the data world," says Michael Whitehead, founder and CEO of WhereScape Inc., an upstart maker of a data warehouse-generating tool called RED.

"When you're building warehouses in an agile fashion, you're bringing together the concepts of software development and data, and a lot of the agile software techniques don't flow across to the data world."

A lot of the agile buzz at last month's TDWI World Conference in San Diego concerned agile business intelligence (BI), which, Whitehead respectfully suggests, isn't at all the same thing as agile data warehousing.

"When people talk agile in the data world, they generally talk agile BI. They generally talk about the reports, that sort of layer becoming agile. That's a no-brainer. If it's a distinct point where you have customer interaction, of course you should put something in front of them. It isn't quite so easy with a data warehouse," he argues.

All the same, Whitehead describes himself as a proponent of agile data warehousing, particularly inasmuch as "agility" connotes the acceleration or automation of tedious, onerous, time-consuming, or otherwise costly tasks.

"The trick is really about how far you can take that [agile in a data warehousing context] back. We think that you should be taking that back to the data layer as well," he explains, conceding that just how agile should be extended to the data layer is the 64 petabyte question.

For example, some agile programming disciplines prescribe periodic "refactoring" -- or recoding -- of applications. In a data warehousing context, this would be analogous to rebuilding (or regenerating) the warehouse itself.

Refactoring, in this context, can be accelerated by a tool like RED. In most configurations RED can suggest prescriptive changes (and, if necessary, implement them) to accelerate the refactoring process. Refactoring, Whitehead cautions, probably isn't something you'd want to automate, however.

"Given the potential volumes that you're dealing with, you don't want us automating that [i.e., refactoring]," he says. "The approach we've taken [with RED] is that we'll identify the change and suggest a way [to implement the proposed change]. You can look at it and either say 'Yes' to it, or decide that you want to take a different approach. We're just not willing to take the plunge to automate that given the knowledge our users have about their own environments."

Agility is, of course, synonymous with nimbleness, deftness -- that is, speed.

It's in this respect that a company such as WhereScape makes its most explicitly agile pitch. Rather than exhaustively planning and scoping and documenting a new warehouse's requirements, Whitehead argues, use a tool such as RED to jump right in and start prototyping it on-the-fly. Use RED to generate a new warehouse or data mart to address seasonal, tactical, or one-off needs. (You can also use RED to cross off the kind of "nice-to-have" features that rarely, if ever, get crossed-off.) Use RED to periodically refactor or improve the performance of your warehouse or data mart assets. Use RED, Whitehead argues, or your information consumers are going to use something else.

This, he concludes, is the essence of agile. "If you're a data guy, you need to make sure that you are doing whatever you can to deliver quickly and deliver value and make changes so that your stuff is relevant," he asserts. "If you can't do that, people are going to fill that vacuum."

Featured

  • Cybersecurity

    DHS floats 'collective defense' model for cybersecurity

    Homeland Security Secretary Kirstjen Nielsen wants her department to have a more direct role in defending the private sector and critical infrastructure entities from cyberthreats.

  • Defense
    Defense Secretary James Mattis testifies at an April 12 hearing of the House Armed Services Committee.

    Mattis: Cloud deal not tailored for Amazon

    On Capitol Hill, Defense Secretary Jim Mattis sought to quell "rumors" that the Pentagon's planned single-award cloud acquisition was designed with Amazon Web Services in mind.

  • Census
    shutterstock image

    2020 Census to include citizenship question

    The Department of Commerce is breaking with recent practice and restoring a question about respondent citizenship last used in 1950, despite being urged not to by former Census directors and outside experts.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.