I’ve been hearing a bit lately on the difference between “analytical data integration” and “operational data integration.” I don’t agree with the distinction any more than I agree with analytical versus operational MDM. In this blog post, I’ll characterize analytical data integration. Warning: It won’t be pretty. In my next one, I’ll take on operational data integration (ditto).
The analytics folks build their own specialized ETL jobs to pull data from operational systems and business applications and often ignore data cleansing, transforming the data on their own particular needs. Most of the time, this is a custom activity. Each time there’s a new report or data mart, new ETL development occurs.
It’s important to realize that data integration is not just about moving data between databases: it’s about moving and merging multiple data sources independent of their format or function. We’re talking more than just relational databases here: we’re talking applications, flat files, objects, APIs, data services (SOA), hierarchical structures, and dozens of others.
Everyone acknowledges that this work consumes about 40 percent of the overall cost of the analytical program. Stovepipe data maintenance activities are rampant, and wasteful. In reality, a lot of ETL work involves a depressing amount of duplicate effort. It’s rare that a business application doesn’t already have at least one piece of ETL written against it. The urge to operationally integrate data can be seen as a remedy for this. But is it really?
Welcome to Baseline’s blog
entries, and to my inaugural blog, Inside
IT. For those of you who have seen me present and read some of my articles,
you’ll be happy (or sad) to know that this blog will echo the same themes,
tone, and yes, sense of humor, from those other media. (I promise to control my
colorful language and not use too many four-letter words, unless it’s something
like “SDLC” or “BPEL.”)
My Baseline blog will be
consistent with the rest of my speaking and writing topics, which means that it
will align with some of the core assumptions in my other content, including:
- We’re doing all
this IT stuff to help the business. We’ve obsessed over the importance of IT
having a place at the corporate table, but we sometimes forget we’re here to
support business actions and decision making. Companies use technology and data
to help run their businesses, not because they want to win awards for the
biggest database. We’re so wrapped up in protecting the reputation of IT that
sometimes we forget about the business. As Jill would say, we do so at our
- Too many IT
organizations forget that data can contribute to innovation. If you take a look
at what a retailer does, it doesn’t invent its own POS or inventory management
systems, it buys them. What’s valuable is the data. Where IT provides value
isn’t in deploying its backbone systems, but creating the decision making systems
supported by information. Which as it happens are closer to the business users.
Notice a theme here?
- Data integration
isn’t rocket science. It’s really not that hard. The complexity isn’t in the
processing. It’s in defining the rules for identification and integration. We
still find IT shops that want to build their own ETL tools rather than
designing the right data integration frameworks. Sometimes the rules that
govern integration aren’t as sexy as building new software. Sometimes we don’t
need to build a better mousetrap ‘cuz there are no mice. We have other problems
The whole premise here, and
maybe my new mantra, is: Leverage, re-use, and buy if you have to. Check back
here often and we’ll discuss how to do them.