The Flaw of the Hub-and-Spoke Architecture
By Evan Levy
I recently talked to a client who was fixated on a hub-and-spoke solution to support his company’s analytical applications. This guy had been around the block a few times and had some pretty set paradigms about how BI should work. In the world of software and data, the one thing I’ve learned is that there are no absolutes. And there’s no such thing as a universal architecture.
The premise of a hub-and-spoke architecture is to have a data warehouse function as the clearing house for all the data a company’s applications might need. This can be a reasonable approach if data requirements are well-defined, predictable, and homogenous across the applications—and if data latency isn’t an issue.
First-generation data warehouses were originally built as reporting systems. But people quickly recognized the need for data provisioning (e.g., moving data between systems), and data warehouses morphed into storehouses for analytic data. This was out of necessity: developers didn’t have the knowledge or skills to retrieve data from operational systems. The data warehouse was rendered a data provisioning platform not because of architectural elegance but due to resource and skills limitations.
(And let’s not forget that the data contained in all these operational systems was rarely documented, whereas data in the warehouse was often supported by robust metadata.)
If everyone’s needs are homogenous and well-defined, using the data warehouse for data provisioning is just fine. The flaw of hub-and-spoke is that it doesn’t address issues of timeliness and latency. After all, if it could why are programmers still writing custom code for data provisioning?
When an airline wants to adjust the cost of seats, it can’t formulate new pricing based on old data—it needs up-to-the-minute pricing details. Large distribution networks, like retailing and shipping, have learned that hub-and-spoke systems are not the most efficient or cost-effective models.
Nowadays most cutting-edge analytic tools are focused on allowing the business to quickly respond to events and circumstances. And most companies have adopted packaged applications for their core financial and operations. Unlike the proprietary systems of the past, these applications are in fact well-documented, and many come with utilities and standard extracts as part of initial delivery. What’s changed in the last 15 years is that operational applications are now built to share data. And most differentiating business processes require direct source system access.
Many high-value business needs require fine-grained, non-enterprise data. To move this specialized, business function-centric content through a hub-and-spoke network designed to support large-volume, generalized data is not only inefficient but more costly. Analytic users don’t always need the same data. Moreover, these users now know where the data is, so time-sensitive information can be available on-demand.
The logistics and shipping industries learned that you can start with a hub-and-spoke design, but when volume reaches critical mass, direct source-to-destination links are more efficient, and more profitable. (If this wasn’t the case, there would be no such thing as the non-stop flight.) When business requirements are specialized and high-value (e.g., low-latency, limited content), provisioning data directly from the source system is not only justified, it’s probably the most efficient solution.
Tags: Baseline Consulting, data provisioning, enterprise data warehouse, ERP, Evan Levy, hub-and-spoke architecture, metadata
About Evan Levy
Evan Levy is management consultant and partner at IntegralData. In addition to his day-to-day job responsibilities, Evan speaks, writes, and blogs about the challenges of managing and using data to support business decision making.7 responses to “The Flaw of the Hub-and-Spoke Architecture”
Leave a Reply Cancel reply
Recent Posts
Archives
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- May 2013
- November 2012
- October 2012
- October 2010
- July 2010
- May 2010
- April 2010
- March 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
Categories
- analytics
- assessments
- best practices
- big data
- business analytics
- business intelligence (BI)
- current events
- customer data integration (CDI)
- data driven
- data governance
- data integration
- data management
- data migration
- data quality
- data science
- data scientist
- data sharing
- data sharing and provisioning
- data strategy
- data supply chain
- data virtualization
- data warehousing
- database administrator (DBA)
- ETL (extract-transform-load)
- implementation
- information architecture
- information management
- IT Governance
- master data management (MDM)
- metadata
- requirements
- service-oriented architecture (SOA)
- Shadow IT
- strategic planning
- value of data
Evan – Nicely boiled down. Thanks. One point I’d add is that in many network-effect endeavors (like knowledge storage/organization) there are negative returns to scale after some point, due to the exponential costs of coordination among the data/semantics/applications relevant to each additional domain (marketing, finance, ops, etc). The “Economies of Scale/Scope” argument works, but only within a natural range.
Hey Evan –
Nicely done.
I don’t think the issue is hub-and-spoke architecture per se, but the mis-application of that (proven) architecture.
I think the problem is that DW architects are trying to schlep around waaaay too much data (most of it never used), and are using a tightly-coupled architecture. It’s overly complex, cannot scale, and can’t meet real-time business BI needs.
A better way is to use existing ESB or MOM infrastructure to subscribe to the business events that provide data of interest.
I keep getting error messages where this architecture is being used. “No spoke data”. Do you think that the software is at fault? It is unknown if the data is being transmitted to the receiver and not replied or if the data is not being received by the receiver.
George,
It sounds as though your current environment hasn’t implemented a reliable delivery mechanism. Most hub-and-spoke architectures don’t make data available (at the hub) unless they’ve implemented a method to ensure that all data sent is delivered.
We often find that send/recieve problems are associated with custom point-to-point data migration mechanisms. That’s one of the reasons so many folks have implemented an enterprise service bus (ESB) to replace their custom solutions. An ESB can ensure delivery of all sent data.
If you have any other questions, you’re welcome to contact me directly.
E.
When you say “provisioning data directly from the source system is not only justified, it’s probably the most efficient solution.”, Are you referring to Independent DataMart?
Thanks for the question. Data provisioning isn’t limited solely to data marts. My remarks regarding provisioning data directly from soure systems wasn’t focused at BI systems or data marts. It was addressing data provisioning in general.
I think it’s important to consider that systems of all types (operational, analytic, etc.) may need non-enterprise, non-standardized data to support their business functions. A DW positioned as a data provisioning hub may not solve every possible data access need (e.g. a CRM system wanting an updated phone number or recent bill payment details).
This is a very interesting article, and I agree with you. With the technology that we have for data storage, businesses must be able to have access to it at all times. But since needs are constantly changing, there is that flaw with the hub-and-spoke artchitecture.