Hadoop Positioned to Displace Many SQL Database Implementations
By Jean-Paul Bergeaux
Chief Technology Officer, SwishData
There are many instances today where the use of traditional databases causes inefficiency and increased workloads. If some of those traditional systems were Hadoop databases instead, users would be a lot happier.
That’s because Hadoop, which is often talked about specifically as a solution to open up a new frontier of database computations, excels where traditional databases can’t be used. Hadoop’s specialty is big data, an umbrella name for dealing with the deluge of information that is being created every second in the digital age. Some typical examples include the analysis of:
- Customers’ buying and social media patterns to improve products and marketing plans
- Billions of insurance claims to determine which ones might be fraudulent
- Supply chain metrics to spot inefficiencies
- Millions of medical records to find new solutions to health problems
These are analyses that were rarely implemented or even attempted because they just took too long in SQL-based solutions. Hadoop has some characteristics that are different than those traditional databases. It scales well using clustered services, is designed to use large files to store information and is very fast at non-real-time batch processing.
Another example many corporations have already identified that exemplifies Hadoop’s big data prowess is in financial quarter and year-end processing. This requires an ETL (Extract-Transform-Load) into Hadoop. Even with that added step, the job processing time is usually improved. It’s not uncommon for it to take a week or more for these jobs to run in a large corporation. Using Hadoop, they only take hours. These types of projects are cash cows right now for companies like Oracle, IBM (DB2) and Microsoft (MS-SQL) — but for how long now that Hadoop is here?
Another useful Hadoop implementation is its use in climate, genome and other scientific modeling. Hadoop is clearly the right answer for this type of scientific research. In the past, custom applications were written to either use SQL databases as a place to store the results, process the data or both. Now, these custom applications are much easier to design and write because most of the heavy lifting is done inside the Hadoop ecosystem.
Data warehousing is also ripe for Hadoop implementation. Databases that are only used for searching (with no real-time processing) often see improved performance at a lower cost in Hadoop. This doesn’t just apply to Google and Yahoo! type applications, but also to the eDiscovery needs of corporations and federal agencies. Any complex job, independent of data size, too time consuming in a SQL database, might be worth trying in Hadoop. I believe that as people try more use cases with Hadoop, they will find it cheaper or better-performing than standard SQL databases.
Look out, Oracle, Microsoft and IBM.
Want to hear more from SwishData? Visit my Data Performance Blog, and follow me on Facebook and Twitter.
Posted by Jean-Paul Bergeaux on Jul 17, 2012 at 12:18 PM