Archive | May 2009

The Rise of the Columnar Database

Column_eflon
photo by eflon

I’m continually surprised that more vendors haven’t hurled themselves onto the columnar database bandwagon. The more this space matures, the more evident it becomes that analytics is a perfect match for column-based database architectures.

One of the most frustrating phenomena to IT is adherence to a theoretical view. In the 1970s the entire relational database industry implemented what was really an academic precept. For those pragmatists who haven’t dusted off their textbooks recently, I’ll recall the writings of Codd and Date. They introduced the concepts of organizing data in tuples, organizing primary values along with their descriptive details (aka: attributes). Vendors interpreted this to mean that data should be physically stored in this fashion, architecting their products to store data in tables, populated with rows consisting of columns. If you wanted to access a value, you had to retrieve the entire row.

With all due respect, this approach has been cumbersome since Day 1. The fact is, storing data the way the business looks doesn’t lend itself to the way people ask questions. When I create an outbound marketing list, I need a name, a phone number, and an address. I don’t need information on household, demographic segment, or the name of a customer’s dog.

While I do need to store all the customer data, I don’t want to be bogged down by processing all that data in order to answer my question. Herein lies the quandary: do I structure the data based on all the information we have, or based on the information I might access?

Vendors have tried to bridge the gap. We’ve seen partitioning, star indexes, query pre-processing, bitmap and index joins, and even hashing in an attempt to support more specific data retrieval. Such solutions still require examining the contents of the entire row.

Although my background is in engineering, I know enough about Occam’s razor to know that it applies here: the simplest solution is the best one. Vendors like Kickfire, Vertica, Paraccel, and Sybase—whose pioneering IQ product launched over a dozen years ago–went back to the drawing board and fixed the problem, architecting their products structure and store the data the way people ask questions—in columns.

For you SQL jockeys, most of the heavy-lifting in database processing is in the where clause. Columnar databases are faster because their processing isn’t inhibited by unnecessary row content. Because many database tables can have upwards of 100 columns, and because most business questions only request a handful of them, this just makes business sense. And In these days of multi-billion row tables and petabyte-sized systems, columnar databases make more sense than ever.

As the data warehouse market continues to consolidate through acquisitions, look for column-based startups—including several open-source solutions—to fill the void. If you ask me, there’s plenty of room.

MDM and M&A

Mergers

A lot of our new clients have asked us to build MDM business cases to support their merger and acquisition strategies. Specifically, they’re looking to support the following four activities:

  • Recent corporate mergers
  • Acquisitions
  • Reorganizations
  • Spin-offs

Collectively, these activities can roll up into a category called corporate restructuring. Contrary to popular belief, restructuring isn’t just a financial challenge. It includes realignment of marketing activities (for instance, reconciling promotions and re-aligning diverse product sets), sales (reorganizing territories and compensation plans), and operational issues (company locations, product inventories).

Most companies approach restructuring as a one-time-only activity in which an army of analysts tries to reconcile financial structures from organizational hierarchies, to budgets, to the accounts themselves. The fact is these activities aren’t just part of high-profile M&A events. They occur every year as companies go through their annual budget processes. During a corporate restructuring the process usually takes longer than the acquisition itself.

Three principle MDM features lend themselves to this restructuring work: matching, grouping, and linking. MDM excels at matching “like” items from disparate sources, tracking and managing hierarchies and groupings, and linking disparate data sources to enable ongoing data integration. The point is that the act of merging organizations also means consolidating details across the companies. Most people consider this a one-time-only activity. The fact is, it must be an ongoing process.

When one company buys another, it’s typical to allow the acquired company to continue to operate using the same systems and methods it always has. The acquiring company simply needs to know how to integrate the information into their existing business. Consider Berkshire Hathaway. They acquire companies frequently, but don’t change how they run their business. They simply know how to reconcile and roll up the details.

Ideally, corporate restructuring means establishing a process to allow organizations to continue their operations using their existing systems. IT systems reconciliation simply cannot get in the way of running business operations. All too often, the answer is, “Replace their systems with ours.” This statement means that the new organization should reengineer its business. This simply takes too long.

MDM provides a company the capability to link the data content from disparate systems within and across companies. I’m not talking about linking Linux with Windows, I’m talking about matching and linking business content across dozens or even hundreds of systems. This way invoices continue going out, sales people continue getting commissions, and customers can still get product support in a seamless way. 

Next time you’re discussing corporate restructuring and someone says the word “re-platform,” ask the question, “If we can link and move the data to continue to support core business processes, then we wouldn’t have to disrupt our operational systems, right?” Matching and linking the data across core systems can save a lot in terms of software and labor costs. But improving it where it lays? Priceless.

%d bloggers like this: