As I wrote in last week’s blog post, a data warehouse appliance simplifies platform and system resource administration. It doesn’t simplify the traditional time-intensive efforts of managing and integrating disparate data and addressing performance and tuning of various applications that contend for the same resources.
Many data warehouse appliance vendors offer sophisticated parallel processing environments, query optimization, and specialized storage structures to improve query processing (e.g., columnar-based engines). It’s naïve to think that taking data from an SMP (Symmetric Multi-Processing) relational database and moving it into a parallel processing environment will effectively scale without any adjustments or changes. Moving onto an appliance can be likened to moving into a new house. When you move into a new, larger house, you quickly learn that it’s not as simple as dumping all of your stuff into the new house. The different dimensions of the new rooms cause you realize that some of your old furniture or rugs simple don’t fit. You inevitably have to make adjustments if you want to truly enjoy your new home. The same goes with a data warehouse appliance; it likely has numerous features to support growth and scalability; you have to make adjustments to leverage their benefits.
Companies that expect to simply dump their data from a few legacy data marts over to a new appliance should expect to confront some adjustments or their likely to experience some unpleasant surprises. Here are some that we’ve already seen.
Everyone agrees that the biggest cost issue behind building a data warehouse is ETL design and development. Hoping to migrate existing ETL jobs into a new hardware and processing environment without expecting rework is short-sighted. While you can probably force fit your existing job streams, you’ll inevitably misuse the new system, waste system resources, and dramatically reduce the lifespan of the appliance. Each appliance has its own way of handling the intensive resource requirements of data loading – in much the same way that each incumbent database product addresses these same situations. If you’ve justified an appliance through the benefits of consolidating multiple data marts (that contain duplicate data), it only makes sense to consolidate and integrate the ETL processes to prevent processing duplication and waste.
To assume that because you’ve built your ETL architecture leveraging the latest and greatest ETL software technology that you won’t have to review the underlying ETL architecture is also misguided. While there’s no question that migrating tool-based ETL jobs to a new platform can be much easier than lower-level code, the issue at hand isn’t the source and destination– it’s the underlying table structures. Not every table will change in definition on a new platform, but the largest (and most used) table content is the most likely candidate for review and redesign. Each appliance handles data distribution and database design differently. Consequently, since the underlying table structures are likely to require adjustment, plan on a redesign of the actual ETL process too.
I’m also surprised by the casual attitude regarding technical training. After all, it’s just a SQL database, right? But application developers and data warehouse development staff need to understand the differences of the appliance product (after all, it’s a different database version or product). While most of this knowledge can be gained through reading the manuals – when was the last time the DBAs or database developers actually had a full-set of manuals—much less the time required to read them? The investment in training isn’t significant—usually just a few days of classes. If you’re going to provide your developers with a product that claims to bigger, better, and faster than its competitors, doesn’t it make sense to prepare them adequately to use it?
There’s also an assumption that—since most data warehouse appliance vendors are software-only—that there are no hardware implications. On the contrary, you should expect to change your existing hardware. The way memory and storage are configured on a data warehouse appliance can differ from a general-purpose server, but it’s still rare that the hardware costs are factored into the development plan. And believing that older servers can be re-purposed has turned out to be a myth. If you ‘re attempting to support more storage, more processing, and more users, how can using older equipment (with the related higher maintenance costs) make financial sense?
You could certainly fork-lift your data, leave all the ETL jobs alone, and not change any processing. Then again, you could save a fortune on a new data warehouse appliance and simply do nothing. After all, no one argues with the savings associated with doing nothing—except, of course, the users that need the data to run your business.
photo by Bien Stephenson via Flickr (Creative Commons License)
Many of our clients have asked us about whether it’s time to consider replacing their aging data warehouses with data warehouse appliance technologies. I chock up this emerging interest to the reality that data warehouse life spans are 3 to 4 years and platforms need to be refreshed. Given the recent crop of announcements by vendors like Oracle and Teradata along with the high visibility of newer players like Netezza, Paraccel, and Vertica.
The benefit of a data warehouse appliance is that includes all of the hardware and software in a preconfigured solution that dramatically simplifies running and managing a data warehouse. (Some of the vendors have taken that one step further and actually sell software that is setup to work with specially defined commodity hardware configurations). Given the price/performance differences between the established data warehouse products and the newer data warehouse appliances, it only makes sense that these products be considered as alternatives to simply upgrading the hardware.
The data warehouse appliance market is arguably not new. In the 1980s companies like Britton-Lee and Teradata argued that database processing was different and would perform better with purpose-designed hardware and software. Many have also forgotten these pioneers argued that the power of commodity microprocessors vastly exceeded the price/performance of their mainframe processor competitors.
The current-generation appliance vendors have been invited to the table because of the enormous costs that have evolved in managing the enormous data volumes and operational access associated with today’s data warehouses. Most IT shops have learned that database scalability doesn’t just mean throwing more hardware and storage at the problem. The challenge in managing these larger environments is understand the dynamics of the data content and the associated processing. That’s why partitioning the data across multiple servers or simply removing history doesn’t work – for every shortcut taken to reduce the data quantity, there’s an equal impact to user access and the single version of truth. This approach also makes data manipulation and even system support dramatically more complicated.
It’s no surprise that these venture capital backed firms would focus on delivering a solution that was simpler to configure and manage. The glossy sales message of data warehouse appliance vendors comes down’ to something like: “We’ve reduced the complexity of running a data warehouse.. Just install our appliance like a toaster, and watch it go!” There’s no question that many of these appliance vendors have delivered when it comes to simplifying platform management and configuration; the real challenge is addressing the management and configuration issues that impact a growing data warehouse: scalable load processing, a flexible data architecture, and manageable query processing.
We’ve already run into several early-adopters that think all that is necessary is to simply fork-lift their existing data warehouse structures onto their new appliance. While this approach may work initially, the actual longevity of the appliance – or its price/performance rationale will soon evaporate. These new products can’t work around bad data, poor design habits, and the limitations of duplicate data; their power is providing scalability across enormous data and processing volumes. An appliance removes the complexities of platform administration. But no matter what appliance you purchase, and no matter how much horsepower it has, data architecture and data administration are still required.
In order to leverage the true power of an appliance, you have to expect to focus effort towards integrating data in a structure that leverages the scalability strengths of the product. While the appliances are SQL-based, the way they process loads, organize data, and handle queries can be dramatically different than their incumbent data marts and data warehouses. It’s naïve to think that a new appliance can provide processing scalability without any adjustments. If it was that simple, the incumbent vendors would have already packaged that in their existing products.
In Part 2 of this post, I’ll elaborate on the faulty assumptions of many companies that acquire data warehouse appliances, and warn you against making these same mistakes.
photo by meddygarnet via Flickr (Creative Commons License)
I’ve noticed lately that data warehouse vendors are dusting off the arguments and pitches of days gone by. Don’t buy specialized hardware for your database needs! You’ll never be able to re-use the gear! One rep recently told a client, “With your data warehouse on our hardware, you can re-purpose the hardware at any time!”
The truth is, while data warehouse failures were rampant a few years ago, those failures are now the exception and not the rule. Data warehouses, once installed, tend to last a while. The good ones actually add more data over time and become more entrenched among user organizations. The great ones become strategic, and business people claim not to be able to do their jobs without them. A data warehouse platform is rarely for a single use, but for a multitude of needs. Data warehouses rarely just go away.
However don’t confuse an entrenched data warehouse with an entrenched data integration solution. I’ll teach a class at The Data Warehousing Institute conferences called “Architectural Options for Data Integration.” The class covers technologies like Enterprise Application Integration (EAI); Enterprise Information Integration (EII); Extract Transformation and Loading (ETL, and its sister, ELT); and Master Data Management (MDM). I present use cases for these different solutions as well as lists of the key vendors that offer them.
Attendees I talk to admit coming to the class with the intent of justifying the data warehouse as a multi-purpose integration system. They leave the class understanding the often-stark differences of these various solutions. And I hope they return to work with a different view of their future-state integration architectures, whether they re-purpose their hardware or not.
Note: Evan’s will be teaching Beyond the Data Warehouse: Architectural Options for Data Integration at the TDWI World Conference in San Diego on Thursday, August 6.
The recent acquisition of Sun by Oracle has raised a lot of speculative discussion about the latter vendor’s strategic pursuits. The move may or may not result in a power triumvirate of HP-IBM-Oracle. But Oracle expanding its portfolio to include hardware could be a game-changer.
Oracle has a dubious record with hardware plays. The nCube investment (circa 1988) and network computer idea (circa 1996) both presented interesting vision, but didn’t deliver tactically. NCube video-on-demand (circa 1994) ceded to decommissioning the product (circa 2001).
While many are focused on the state of Sun’s numerous DBMS partnerships, I’m more interested in the fate of Storage Technologies, which was acquired by Sun (circa 2002). Do a little research and you’ll see that EMC stores the lion’s share of DBMS data across enterprise data centers. If Oracle keeps the Storage Tech products it might shave some revenue from EMC and gain an even larger wallet share with IT organizations. Oracle’s intentions are equally unclear around the Exadata product, which had previously relied on the HP partnership that’s certainly strained. With the acquisition of Sun, Oracle is more able to go head-to-head with the likes of HP’s Neoview and Teradata.
Clearly the company has the option of producing a database appliance on its own. Personally I’m waiting to see the level of fear, uncertainty, and doubt Oracle stir up into the data warehouse appliance market. Oracle hasn’t differentiated its DBMS in years. The differentiation has always been about the company’s size, the number of Fortune 500 customers, and its broad array of application offerings, and that they work on every conceivable hardware platform. Focus on non-database products has fanned the flames of the market’s perception that databases are a mere commodity.
I can only imagine what’s going on in Oracle’s slideware development organization right now. Here are some of the messaging scenarios that are likely to be on the table:
Scenario 1: “Through our acquisition of Sun, we can now deliver a more fully-functional database appliance.”
In reality, the whole point of an appliance is to reduce complexity and configuration effort. Prepackaging Oracle on a hardware platform already occurs with companies like Sun, HP, and Dell. This isn’t simpler or better.
Scenario 2: “Oracle can now be your de-facto desktop and development tool provider.”
This one could actually be true. Oracle can leverage Sun’s vast software capabilities in two significant ways. With Sun’s desktop office suite, StarOffice, Oracle could provide a captivating alternative to the Microsoft Office monopoly. Any executive would find it difficult to ignore an Oracle office option, particularly in cases where they’ve made significant investments in Oracle as the corporate database standard. Plus, Oracle can monetize open source software by dramatically improving support revenue from these customers. Microsoft does not deliver customer service and support the way Oracle does—and enterprise clients expect more sophisticated and consistent support than the channel usually delivers.
Scenario 3: “Our Java-based toolset covers the spectrum of development needs without forcing your reliance on a specific vendor. Whether it’s middleware, server development, or reporting, we have the tools to support a multi-tier network enabled environment. You can now come to a single company for a single set of tools regardless of your platform type, desktop, server, or operating system.”
For IT organizations that still rely on custom development, this may dramatically reduce the number of suppliers they need. Over the past few years the number of middleware and application tool vendors has diminished—with Oracle being the buyer of many of them. Most IT organizations prefer fewer vendors. Whether open source or proprietary, the combined Oracle-Sun toolset offers Oracle a significant revenue stream in the support arena.
I’m fascinated that little or no attention has been paid to the software assets that Sun has. This combined with Oracle’s DBMS, middleware, and application toolsets offers an unexpected alternative to the ongoing IBM and Microsoft battles for enterprise development. Moreover, with Sun’s Java leadership and the popularity of Java in consumer electronics, Oracle can now enter into the world of consumer software, a la Apple. The opportunity for Oracle to support media companies that sell directly to the end consumer is wide open.
If it’s not careful, Oracle’s future may be in milking the legacy product cow instead of exploiting its newfound software assets. The real question is, is Oracle a company of innovators or bean counters?