By Jean-Paul Bergeaux
Chief Technology Officer, SwishData
Conventional thought is that clustered non-SQL databases are going to change business analytics and make the data that large organizations collect more useful. As usual, a few tools have risen to the top of the competitive heap, each with its own specialty.
MongoDB focuses on documents and Cassandra is focused on high-availability applications, but it’s Hbase/MapReduce that seems to get the most press. Hbase and MapReduce are part of the Apache Hadoop ecosystem that includes Hive, a solution for translating from SQL language to Hadoopusable language.
IT heavy hitters Facebook, Google and Yahoo! were involved in the perpetuation of Hadoop and use it themselves. Though Hadoop has some limitations, the maturity of the ecosystem — along with companies like Cloudera, Hortonworks and MapR are creating a package and support system — makes it a good bet to win the no-SQL battles in the enterprise space.
But are some giving Hadoop and others like it too little credit? I think so. For years, IT organizations have used SQL-based databases for usecases that really don’t fit the monolithic-database sweet spot because they really didn’t have much of a choice in the enterprise. These monolithic-databases had momentum and a plethora of available hires that had technical competencies in SQL-based administration and development. Oracle’s Larry Ellison revealed that they are now OEM partners with Cloudera and will sell their Hadoop package. Why would he bolster belief in Hadoop if he didn’t think it was going to make a significant impact to his business?
In specific usecases, Hadoop offers 10 times the performance of a monolithic-database for 10 percent of the cost. It’s a no-brainer. As Hadoop or another no-SQL clustered database matures into an enterprise product with technical talent to support it, Oracle or MS-SQL are going to lose some of the most profitable installations. Not all, but most, of the big-money database installations fit nicely in Hadoop usecases.
Some SQL proponents have compared Apache Hadoop to Linux 10 years ago, with the intent of dampening the expected impact of Hadoop. Red Hat Enterprise Linux (RHEL) never did the amount of damage to Microsoft and other server operating systems that many had predicted. Today, RHEL has carved out a nice market, but it’s not the dominant player in the enterprise server market. There are some striking similarities. Linux was free for download, but until Red Hat came along with RHEL, it wasn’t ready for the enterprise. Cloudera, Hortonworks and MapR can play a similar role for Hadoop as Red Hat was for Linux. But there are some major differences between Linux and Hadoop as a threat to their traditional IT counterparts.
I’ll get into that in two weeks. Next week I’ll finish the cloud discussion I started a few weeks ago. See you then!
(Until then, you can read more on my thoughts about Hadoop on the Data Performance Blog.)
NEXT STORY: Upcoming FCW Feature