A cost of consolidation?

Despite the push to consolidate data centers in Washington, D.C., and at state capitols nationwide, some would-be centralizers might want to recalculate the risks involved after the recent meltdown of state services in Virginia.

The failure of a single storage system Aug. 25 at a data center near Richmond took down 485 of the state’s 4,800 data servers, knocking out services at three state agencies for more than a week and affecting operations at two dozen others.

State Chief Information Officer Sam Nixon told the Washington Post that the crash was caused by the dual failure of a pair of redundant, 3-year-old memory cards, one of which was supposed to back up the other.

"The thing that is never supposed to happen happened," Nixon said.

Virginia officials said EMC, the company that designed and supplied the storage system, told them the outage had never occurred before in 1 billion hours of system use.

The debacle turns a spotlight on the continuity-of-operations risks associated with consolidating multiple IT operations on fewer, larger data processing and storage systems, as Virginia has done through a $2.4 billion contract it awarded to Northrop Grumman in 2003 and renegotiated this past spring.

The incident also calls into question the design and reliability of supposedly fault-tolerant systems when real-world problems trigger them into action. Information Age reported that a situation similar to the Virginia outage occurred earlier this year when e-mail hosting provider Intermedia lost service to many of its customers after a problem on its EMC storage-area network.

Intermedia officials said a backup storage controller took over when the primary one failed because of a system bug, but the backup device had insufficient capacity to shoulder the entire workload. The company said it has taken corrective action to ensure that there is enough spare capacity on the storage-area network to continue operation in case of future failures.

Virginia CIO Nixon told local TV station NBC12 that he remains committed to centralized IT services despite the recent snafu. But the news will likely have government IT officials elsewhere taking a second look at their system designs and procedures to make sure they can continue operations in case of unexpected problems.

FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.


  • Anne Rung -- Commerce Department Photo

    Exit interview with Anne Rung

    The government's departing top acquisition official said she leaves behind a solid foundation on which to build more effective and efficient federal IT.

  • Charles Phalen

    Administration appoints first head of NBIB

    The National Background Investigations Bureau announced the appointment of its first director as the agency prepares to take over processing government background checks.

  • Sen. James Lankford (R-Okla.)

    Senator: Rigid hiring process pushes millennials from federal work

    Sen. James Lankford (R-Okla.) said agencies are missing out on younger workers because of the government's rigidity, particularly its protracted hiring process.

  • FCW @ 30 GPS

    FCW @ 30

    Since 1987, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

Reader comments

Thu, Sep 23, 2010 Allen

The story is a bit more interesting as four days of data were lost. Lesson 1 - expect the unexpected be it Titanic or computer equipment. Lesson 2 - when things break, ensure all the pieces can be put back together. Here an official source is needed for the VA. incident. Lesson 3 - use your backup system from time to time. Our auditor was amazed, yes amzed, that we use last nights backup to resfresh test system every day. Hence we know, know - not think, the restore works. May all this give voice to good people crying "test and train now" If money is an issue - they lack understanding of the propblem - IMO. Kind regards.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group