The IT Road Less Traveled

by Thought Leaders at SwishData



Avoiding Secrecy, Unnecessary Data Center Consolidation in Government

By Jean-Paul Bergeaux
Chief Technology Officer, SwishData

I recently attended an industry and government joint session where the discussion focused on what different agencies are doing in terms of data center consolidation. One of the topics of debate was “what should agencies consolidate?” Should consolidation involve every single computer classified as a ‘data center,’ or is it better to make case-by-case decisions?

The policing approach
One government agency’s policy was that any data on any server must be consolidated, even if it is just a PC sitting under someone’s desk. This description of IT workers going from office to office searching for single server ‘data centers’ to consolidate, made me think of the police raiding houses looking for a bad guy.

The collaborative approach
There is a different approach. In referencing their own policy, another agency’s official said, following a rigid set of requirements might not always yield the best results. For instance, if data is critical to the mission of the agency, it would fall under agency compliance rules and would have to be consolidated. But if it is not, a group could assume responsibility to administer its own data. (As long as it reaches agreement with IT about how to do it.)

Often the people who agree to administer their own data have very proprietary, specific, applications that are not mainstream and can be difficult to assume all responsibility for. - There have been situations where finger-pointing occurred because the original program office had to stay involved in administration of applications its IT team didn’t understand.

Inflexibility causes strains on IT budgets
When consolidating, the agency would have to transfer the budget allocated to those expenses to the IT department. Under ideal circumstances the decision to move some or all applications to IT should be made in cooperation with program offices, however, If offices decide to fight for budgets and win (and they often do), they may find alternate ways to manage data, leaving IT maintaining useless servers without the funds to do so.

Another problem agencies face is the perception that IT departments are slow-moving, rigid and difficult to deal with. This negative customer-service perception has IT admins concerned that an environment that encourages rogue activity could emerge. Admins already struggle with users and groups installing applications on their desktops, not to mention using non-IT funds for public cloud resources. If IT takes away its servers arbitrarily, the internal groups might just move to the cloud or applications without proper authority, creating even more security and administrative nightmares.

A win-win for program offices and IT departments
Because of these issues, some government agencies’ IT departments are deciding to closely collaborate with program offices when deciding what to consolidate. This open and cooperative policy makes internal groups happier, while providing IT a better eye into potential risks and needs across the enterprise. This works to the agencies’ advantage because groups are encouraged to work together instead of consolidating data for consolidation’s sake.

Want to hear more from SwishData? Visit my Data Performance Blog, and follow me on Facebook and Twitter.

Posted by Jean-Paul Bergeaux on Jul 24, 2012 at 12:18 PM0 comments


Hadoop Positioned to Displace Many SQL Database Implementations

By Jean-Paul Bergeaux
Chief Technology Officer, SwishData

There are many instances today where the use of traditional databases causes inefficiency and increased workloads. If some of those traditional systems were Hadoop databases instead, users would be a lot happier.

That’s because Hadoop, which is often talked about specifically as a solution to open up a new frontier of database computations, excels where traditional databases can’t be used. Hadoop’s specialty is big data, an umbrella name for dealing with the deluge of information that is being created every second in the digital age. Some typical examples include the analysis of:

  • Customers’ buying and social media patterns to improve products and marketing plans
  • Billions of insurance claims to determine which ones might be fraudulent
  • Supply chain metrics to spot inefficiencies
  • Millions of medical records to find new solutions to health problems

These are analyses that were rarely implemented or even attempted because they just took too long in SQL-based solutions. Hadoop has some characteristics that are different than those traditional databases. It scales well using clustered services, is designed to use large files to store information and is very fast at non-real-time batch processing.

Another example many corporations have already identified that exemplifies Hadoop’s big data prowess is in financial quarter and year-end processing. This requires an ETL (Extract-Transform-Load) into Hadoop. Even with that added step, the job processing time is usually improved. It’s not uncommon for it to take a week or more for these jobs to run in a large corporation. Using Hadoop, they only take hours. These types of projects are cash cows right now for companies like Oracle, IBM (DB2) and Microsoft (MS-SQL) — but for how long now that Hadoop is here?

Another useful Hadoop implementation is its use in climate, genome and other scientific modeling. Hadoop is clearly the right answer for this type of scientific research. In the past, custom applications were written to either use SQL databases as a place to store the results, process the data or both. Now, these custom applications are much easier to design and write because most of the heavy lifting is done inside the Hadoop ecosystem.

Data warehousing is also ripe for Hadoop implementation. Databases that are only used for searching (with no real-time processing) often see improved performance at a lower cost in Hadoop. This doesn’t just apply to Google and Yahoo! type applications, but also to the eDiscovery needs of corporations and federal agencies. Any complex job, independent of data size, too time consuming in a SQL database, might be worth trying in Hadoop. I believe that as people try more use cases with Hadoop, they will find it cheaper or better-performing than standard SQL databases.

Look out, Oracle, Microsoft and IBM.

Want to hear more from SwishData? Visit my Data Performance Blog, and follow me on Facebook and Twitter.

Posted by Jean-Paul Bergeaux on Jul 17, 2012 at 12:18 PM0 comments


Hadoop’s Challenge is Different than Linux’s

By Jean-Paul Bergeaux
Chief Technology Officer, SwishData

Last time I wrote about Hadoop, I talked about its challenge to traditional SQL-based databases. I left off with mentioning that some SQL proponents have compared Apache Hadoop to Linux 10 years ago. They assume that Hadoop will find a niche and have little effect on the traditional database core business. However, that might not be the case. Some of the differences make that a potentially inaccurate comparison.

To start with, Linux is much lower in the IT stack at the operating system layer, not the application layer. So Linux could have indeed replaced all of its competition if applications and admins had chosen to use that platform. But Hadoop is a type of application that has specific usecases, allowing advocates to be laser-focused on where it fits well. This could be a significant threat to typical SQL databases because the most costly databases fit right into Hadoop’s sweet spot. More importantly, because it’s a type of application, not an OS, Hadoop doesn’t need the most mainstream commercial off-the-shelf (COTS) applications to port over their products to a new platform and gain momentum. That’s where Linux struggled to compete and why Linux was eventually relegated to a small, specialized segment of the data center.

Probably the best reason Hadoop is more likely to succeed is the cost/performance gap is larger for Hadoop vs. SQL than Linux vs. Windows was. Hadoop offers 10 times the performance of a monolithic database for 10 percent of the cost. Red Hat Enterprise Linux (RHEL) saved money, but the impact was hardly in this magnitude and harder to quantify. RHEL required large deployments to collect big numbers. A single SQL database design, cost and administration can be large enough to see an impact of millions in savings immediately with Hadoop. This is why Oracle makes so much money and why Microsoft (SQL) and IBM (DB2) fight so hard in this space.

Going back to the point about the COTS application conversion problem Linux had, Hadoop’s target is the applications enterprise organizations have already built around SQL databases. Current applications don’t have to be rebuilt to convert to Hadoop. As Facebook’s first implementation of Hadoop proved, Hive can translate the SQL commands to MapReduce/Hbase usable commands — then the store out the back end can go to SQL, Oracle or both.

The one hurdle that Hadoop still faces is that in its purest “commodity shared-nothing” form, Hadoop still isn’t enterprise ready. There are ways to make Hadoop enterprise ready, and they work, but they are not “mainstream” and sometimes outright rejected by the Hadoop community. MapR’s Hadoop distribution is an example of this phenomenon. Hadoop in its basic form does not offer risk-free data storage and computation. There are failure points that need to be address and can be addressed with enterprise infrastructure. However, until this becomes more mainstream, Hadoop will be relegated to special use-cases.

Next time we talk about Hadoop, I’ll discuss some use cases for where it does and does not fit for agencies.

Want to hear more from SwishData? Visit my Data Performance Blog, and follow me on Facebook and Twitter.

Posted by Jean-Paul Bergeaux on Jul 10, 2012 at 9:03 AM0 comments


Government IT Departments Must Go Private, Not Public, Cloud

By Jean-Paul Bergeaux
Chief Technology Officer, SwishData

I feel for the federal agency IT departments. They are faced with a mandate to go cloud-first, but moving IT to a public cloud (SaaS) is mostly a business decision, not an IT decision. Let’s be honest, a public cloud offering is not really a technical change from what large agencies’ IT department can do themselves. So it’s really about shifting risk to an external entity and converting IT costs to operating budget (OPEX) instead of capital budget (CAPEX).

They have all the tools and knowledge to assess the technical viability of public cloud options, but are pressured by policy to move forward whether or not it makes sense. That’s just bad policy. Not only are there larger ramifications of going to a SaaS solution, but also most IT departments aren’t responsible for all the cost to do a true return on investment (ROI) calculation. Just two examples are floor space and electricity costs. GSA and direct agency contracts are usually with an agency as a whole and often include the cost for power. IT departments are not privy to the difficult process of offloading those properties, which ABC and FoxNews have reported over the years.

Then what does that IT department and agency do about the projected reduction in staff? Whether the IT departments could be reduced is something that can be debated, but agencies can’t just reduce staff as easily as a private company. They will have to absorb costs within public cloud options that they do not manage the budget for, such as the aforementioned energy and space. Suddenly, their IT-only dollars have to absorb these costs while getting reduced budgets from Congress and their agency. In the end, what is the motivation and benefit for these IT departments to go public cloud?

The answer can only be a private cloud. Before recent advances such as virtualization, blade server and IT management technology, consolidating IT departments across an agency was not feasible, nor desirable. The result has been hundreds of small- and medium-sized data centers littering government buildings all across the country. They are no longer the most efficient way to manage IT, and the best result of the cloud-first initiative should be the consolidation of these groups together in private clouds.

With today’s modern virtualized data center tools, a private cloud with all of these agencies ‘owning’ their IT within the consolidated infrastructure has real potential to save government agencies serious money. All this can be achieved without introducing security and information assurance problems and mind-bending ROI calculations.

This movement is not without some challenges; the two largest being, each department trusting that they will keep their autonomy in such a venture, and also how to fund an agency-wide private cloud project. These issues must be managed as an entire agency, not by individual IT departments. If it is ever effectively done, the government stands to save billions of taxpayer dollars.

Want to hear more from SwishData? Vist my Data Performance Blog, and follow me on Facebook and Twitter.

Posted by Jean-Paul Bergeaux on Jun 26, 2012 at 12:18 PM1 comments


Business Analytics Solutions Will Spread

By Jean-Paul Bergeaux
Chief Technology Officer, SwishData

Conventional thought is that clustered non-SQL databases are going to change business analytics and make the data that large organizations collect more useful. As usual, a few tools have risen to the top of the competitive heap, each with its own specialty.

MongoDB focuses on documents and Cassandra is focused on high-availability applications, but it’s Hbase/MapReduce that seems to get the most press. Hbase and MapReduce are part of the Apache Hadoop ecosystem that includes Hive, a solution for translating from SQL language to Hadoopusable language.

IT heavy hitters Facebook, Google and Yahoo! were involved in the perpetuation of Hadoop and use it themselves. Though Hadoop has some limitations, the maturity of the ecosystem — along with companies like Cloudera, Hortonworks and MapR are creating a package and support system — makes it a good bet to win the no-SQL battles in the enterprise space.

But are some giving Hadoop and others like it too little credit? I think so. For years, IT organizations have used SQL-based databases for usecases that really don’t fit the monolithic-database sweet spot because they really didn’t have much of a choice in the enterprise. These monolithic-databases had momentum and a plethora of available hires that had technical competencies in SQL-based administration and development. Oracle’s Larry Ellison revealed that they are now OEM partners with Cloudera and will sell their Hadoop package. Why would he bolster belief in Hadoop if he didn’t think it was going to make a significant impact to his business?

In specific usecases, Hadoop offers 10 times the performance of a monolithic-database for 10 percent of the cost. It’s a no-brainer. As Hadoop or another no-SQL clustered database matures into an enterprise product with technical talent to support it, Oracle or MS-SQL are going to lose some of the most profitable installations. Not all, but most, of the big-money database installations fit nicely in Hadoop usecases.

Some SQL proponents have compared Apache Hadoop to Linux 10 years ago, with the intent of dampening the expected impact of Hadoop. Red Hat Enterprise Linux (RHEL) never did the amount of damage to Microsoft and other server operating systems that many had predicted. Today, RHEL has carved out a nice market, but it’s not the dominant player in the enterprise server market. There are some striking similarities. Linux was free for download, but until Red Hat came along with RHEL, it wasn’t ready for the enterprise. Cloudera, Hortonworks and MapR can play a similar role for Hadoop as Red Hat was for Linux. But there are some major differences between Linux and Hadoop as a threat to their traditional IT counterparts.

I’ll get into that in two weeks. Next week I’ll finish the cloud discussion I started a few weeks ago. See you then!

(Until then, you can read more on my thoughts about Hadoop on the Data Performance Blog.)

Posted by Jean-Paul Bergeaux on Jun 19, 2012 at 12:18 PM0 comments