Your Company’s Data Supply Chain

photo by BotheredByBees
At Baseline Consulting we've been talking for several years about the concept of a data supply chain. But IT executives are only now starting to catch on to its importance.
Over the past 15 years there has been a big push to standardize on off-the-shelf software. This allowed IT organizations to buy instead of build. We've migrated from proprietary architectures to Windows and Linux standards. We've gone from custom-built applications to packaged CRM and ERP applications. IT adopted this approach because its value is automating business processes and supporting analysis– not inventing new technologies. The problem is that moving data between all of these "packaged systems" still requires custom code.
There's no question that middleware provides value: it delivers the pre-built data pipes. Unfortunately, these are toolkits requiring developers to write code to connect their packages to the pipes. Most CIOs are blissfully unaware of the amount of custom coding middleware requires. Trust me: IT spends an enormous amount of money on supporting such data migration solutions. Many IT shops still view middleware as sacred ground.
The data warehousing world has enthusiastically adopted ETL tools to reduce custom coding so they can focus on the issues of data accuracy and usability. One fact lost in translation is that ETL integrates data– it's more than just a pipe. The application world has adopted EAI, ESB, and orchestration to move data quicker. However, there's no integration. Each application is responsible for integrating the data they receive.
So, there's even more custom code. Code to connect an application to the pipes. Code to integrate and cleanup the data they receive from the pipes.
Custom code to move data around isn't the answer. Orchestration, message passing, and data movement just creates a labyrinth of pipes. There are no economies of scale. The data doesn't get better.
Walmart learned years ago that it was impractical to have a custom (and separate) distribution system for every supplier. They knew the cost benefits of a standard distribution system; this meant they needed to standardize the size of the trailers, the size of the boxes, and the way the boxes were packed and shipped. The benefits of a supply chain is that standardization occurs at the most cost effective point: the source. Walmart's distribution success was measured by its ability to accept new suppliers and manage more shipments.
Most CIOs don't recognize that they have a data supply chain. Instead of building a custom distribution system for each suppler (each business application), they should be focused on a single data supply chain. Middleware supports the creation of custom distribution solutions, but not the standardization of data. A data supply chain can only be successful if the data is standardized. Otherwise everyone is forced to write custom code to standardize, clean, and integrate the data.
Blurring the Line Between SOA and BI
photo by Siomuzzz
I recently read with interest an article in the Microsoft Architect Journal on so-called Service-Oriented Business Intelligence or, as the article’s authors call it, “SoBI.” The article was well-intentioned but confusing. What it confirmed to me is that plenty of experienced IT professionals are struggling to reconcile Service Oriented Architecture (SOA) concepts with business intelligence.
SOA is certainly a valuable tool in the architecture and development toolbox; however, I think it’s only fair to keep SOA in perspective. It’s an evolutionary technology in IT that has numerous benefits to developer productivity and application connectivity. I’m not sure that injecting SOA into a data warehouse environment or framework will do anything more than freshen a few low-level building blocks that have been neglected in some data warehouse environments. I’m certainly not challenging the value of SOA; I’m just trying to put in perspective to those folks that are focused on data warehouse and business intelligence activities.
The idea around SOA is to create services (or functions, procedures, etc.) that can be used by other systems. The idea is simple: build once, use many times. This ensures that important (and possibly complicated) application processes can be used by numerous disparate applications. It’s like an application processing supply chain: let the most efficient resource build a service and provide to everyone else for use. SOA provides a framework for allowing multiple applications access to common, well-defined services. These services can contain code and/or data.
The question for most data warehouse environment’s isn’t whether SOA can improve (or benefit) the data warehouse; it’s understanding how SOA can benefit a data warehouse.
We’ve got lots of clients leveraging SOA to support their data warehouse. They’ve learned they can leverage SOA techniques and coding to deliver standardized data cleansing and data validation to a range of business applications. They have also upgraded the operational system data extraction code to leverage SOA which allowed other application systems (or data marts) to reuse their code.
However, their use of the SOA hasn’t been focused on enhancing the data warehouse environment as much as has been focused on packaging their development efforts for others to use. Most data warehouse developers invest heavily in navigating IT’s labyrinth of operational systems and application data in order to identify, cleanse, and load data into their warehouses. What they’ve learned is that for every new ETL script, there are probably 20 other systems that have to custom developed their own data retrieval code and never documented it. The value that many data warehouse developers find with SOA isn’t that they are improving their data warehouse; they’re just addressing the limitations of the application systems.