The Low-Down on Analytical Data Integration
I’ve been hearing a bit lately on the difference between “analytical data integration” and “operational data integration.” I don’t agree with the distinction any more than I agree with analytical versus operational MDM. In this blog post, I’ll characterize analytical data integration. Warning: It won’t be pretty. In my next one, I’ll take on operational data integration (ditto).
The analytics folks build their own specialized ETL jobs to pull data from operational systems and business applications and often ignore data cleansing, transforming the data on their own particular needs. Most of the time, this is a custom activity. Each time there’s a new report or data mart, new ETL development occurs.
It’s important to realize that data integration is not just about moving data between databases: it’s about moving and merging multiple data sources independent of their format or function. We’re talking more than just relational databases here: we’re talking applications, flat files, objects, APIs, data services (SOA), hierarchical structures, and dozens of others.
Everyone acknowledges that this work consumes about 40 percent of the overall cost of the analytical program. Stovepipe data maintenance activities are rampant, and wasteful. In reality, a lot of ETL work involves a depressing amount of duplicate effort. It’s rare that a business application doesn’t already have at least one piece of ETL written against it. The urge to operationally integrate data can be seen as a remedy for this. But is it really?