So You Think You’re Ready for a Data Warehouse Appliance
Many of our clients have asked us about whether it’s time to consider replacing their aging data warehouses with data warehouse appliance technologies. I chock up this emerging interest to the reality that data warehouse life spans are 3 to 4 years and platforms need to be refreshed. Given the recent crop of announcements by vendors like Oracle and Teradata along with the high visibility of newer players like Netezza, Paraccel, and Vertica.
The benefit of a data warehouse appliance is that includes all of the hardware and software in a preconfigured solution that dramatically simplifies running and managing a data warehouse. (Some of the vendors have taken that one step further and actually sell software that is setup to work with specially defined commodity hardware configurations). Given the price/performance differences between the established data warehouse products and the newer data warehouse appliances, it only makes sense that these products be considered as alternatives to simply upgrading the hardware.
The data warehouse appliance market is arguably not new. In the 1980s companies like Britton-Lee and Teradata argued that database processing was different and would perform better with purpose-designed hardware and software. Many have also forgotten these pioneers argued that the power of commodity microprocessors vastly exceeded the price/performance of their mainframe processor competitors.
The current-generation appliance vendors have been invited to the table because of the enormous costs that have evolved in managing the enormous data volumes and operational access associated with today’s data warehouses. Most IT shops have learned that database scalability doesn’t just mean throwing more hardware and storage at the problem. The challenge in managing these larger environments is understand the dynamics of the data content and the associated processing. That’s why partitioning the data across multiple servers or simply removing history doesn’t work – for every shortcut taken to reduce the data quantity, there’s an equal impact to user access and the single version of truth. This approach also makes data manipulation and even system support dramatically more complicated.
It’s no surprise that these venture capital backed firms would focus on delivering a solution that was simpler to configure and manage. The glossy sales message of data warehouse appliance vendors comes down’ to something like: “We’ve reduced the complexity of running a data warehouse.. Just install our appliance like a toaster, and watch it go!” There’s no question that many of these appliance vendors have delivered when it comes to simplifying platform management and configuration; the real challenge is addressing the management and configuration issues that impact a growing data warehouse: scalable load processing, a flexible data architecture, and manageable query processing.
We’ve already run into several early-adopters that think all that is necessary is to simply fork-lift their existing data warehouse structures onto their new appliance. While this approach may work initially, the actual longevity of the appliance – or its price/performance rationale will soon evaporate. These new products can’t work around bad data, poor design habits, and the limitations of duplicate data; their power is providing scalability across enormous data and processing volumes. An appliance removes the complexities of platform administration. But no matter what appliance you purchase, and no matter how much horsepower it has, data architecture and data administration are still required.
In order to leverage the true power of an appliance, you have to expect to focus effort towards integrating data in a structure that leverages the scalability strengths of the product. While the appliances are SQL-based, the way they process loads, organize data, and handle queries can be dramatically different than their incumbent data marts and data warehouses. It’s naïve to think that a new appliance can provide processing scalability without any adjustments. If it was that simple, the incumbent vendors would have already packaged that in their existing products.
In Part 2 of this post, I’ll elaborate on the faulty assumptions of many companies that acquire data warehouse appliances, and warn you against making these same mistakes.
photo by meddygarnet via Flickr (Creative Commons License)