The issue of Total Cost of Ownership (TCO) seems to come and go every few years. The need for it tends to ebb and flow with corporate budget cycles. TCO is perfectly fine for well-understood commodity functions or defined business processes. If I have to replace a server or a printer, or change a business process, TCO is a perfectly rational metric for comparing different alternatives.
When TCO calculations work, they tend to roll up within a single organization or manager. The hardware, the software, the installation, and the maintenance are under the domain of a single organization that covers the direct cost.
The problem with TCO arises when it’s used as a metric for justifying cross-functional or analytical systems. With these systems, the value isn’t delivering commodity processing but rather supporting decision making. TCO focuses on construction and maintenance costs. But for analytical systems, usage occurs across different organizations and varies with business value and need. TCO can in fact be misapplied.
At a simple level, TCO is often limited to processing hardware, storage, software, and IT resources necessary to configure and manage the platform on an ongoing basis. But this is usually limited to IT staff focused on system development and maintenance. Unfortunately the most expensive cost—not normally included in TCO calculations—is the business user’s time. While TCO quantifies costs for a data warehouse developer, there is no clear way to calculate costs for the dozens or hundreds of business users who are actually analyzing data and creating reports every day. The reality of analytical systems is that development continues every day on the business side.
Nevertheless it’s common for TCO calculations to be reduced to the cost of processing or storage, rather than reflecting the exponential costs of users circumventing slow-running queries and inaccurate data. At the end of the day, TCO shouldn’t only be about the cost of hardware and software installation and maintenance. It should be about the cost of continued business usage.
photo by -Luz- via Flickr (Creative Commons license)
Back when I was applying to college, I’d read over college catalogs. Inevitably, each university would mention the number of books it had in its library. When I finally went to college, I realized that this metric was fairly meaningless. A dozen volumes on Grecian pottery did me no good when I was in search of a book on polymers for my mechanical engineering class.
Clients will often ask us to scope a “data inventory” project, inevitably focused on identifying and describing all the data elements contained across their different application systems. Recently a new CIO asked us to head up a “tiger team” to inventory his company’s data. He was surprised at the quantity of information needs that had been sent his way. As expected, he inquired about systems of record and data dictionaries. As you can imagine, he received multiple and conflicting answers which only exacerbated his confusion.
As a point of reference, well-known ERP systems can have in excess of 50,000 discrete data elements in their databases (never mind that some aren’t in English). As I’ve written in the past, many of these data elements have no use outside of the application itself.
Having terabyte upon terabyte of information is equally irrelevant if that data is unrelated to current business issues. The problem with a data inventory activity is that identifying and counting data elements in different systems and applications won’t necessarily solve any problems. Why? Because data across applications and packages is inconsistent: there are different names, definitions, and values, and there is no practical means of determining which data they actually have in common. This is like going to the hardware store and looking for a specific screw, but all the different screws are in one big barrel—you end up having to pick through each screw, one at time. When you find the screw, you just throw all the other screws back into the barrel.
The point of a data inventory isn’t to pick through data because it exists, but to inventory the data people actually need. If you’re going to undertake a data inventory, your output should be structured so that the next person doesn’t have to repeat your work. Identify the data that is moving across various systems, as this indicates key information that’s being shared. Categorize this data by subject area. You’ll inevitably find that there are inconsistent versions of the data, enabling you to identify data disparities. You can then begin to develop a catalog of key corporate data that will form the basis of your data dictionary.
Inventorying the data that moves between systems accomplishes two things: it identifies the most valuable data elements in use, and it will also help identify data that’s not high-value, as it’s not being shared or used. This approach also provides a way to tackle initial data quality efforts by identifying the most “active” data used by the business. It ultimately helps the data management team understand where to focus its efforts, and prioritize accordingly.
So next time someone suggests a data inventory without context or objectives, consider sending them to college to study Grecian urns.
There are far too many data warehouse development teams solely focused on loading data. They’ve completely lost sight of their success metrics.
Why have they fallen into this rut? Because they’re doing what they’ve always done. One of the challenges in data warehousing is that as time progresses the people on the data warehouse development team are often not the same people who launched the team. This erosion of experience has eroded the original vision and degraded the team’s effectiveness .
One client of mine actually bonused their data warehouse development team based on system usage and capacity. Was there a lot of data in the data warehouse? Yep. Were there multiple sandboxes, each with its own copy of data? Yep. Was this useful three years ago? Yep. Does any of this matter now? Nope. The original purpose of the data warehouse—indeed, the entire BI program—has been forgotten.
In the beginning the team understood how to help the business. They were measured on business impact. Success was based on new revenues and lower costs outside of IT. The team understood the evolution of the applications and data to support BI was critical in continually delivering business value. There was an awareness of what was next. Success was based on responding to the new business need. Sometimes this meant reusing data with new reports, sometimes it meant new data, sometimes it was just adjusting a report. The focus was on aggressively responding to business change and the resulting business need.
How does your BI team support decision making? Does it still deliver value to business users? Maybe your company is like some of the companies that I’ve seen: the success of the data warehouse and the growth of its budget propelled it into being managed like an operational system. People have refocused their priorities to managing loads, monitoring jobs, and throwing the least-expensive, commodity skills at the program. So a few years after BI is introduced, the entire program has become populated with IT order-takers, watching and managing extracts, load jobs, and utilization levels.
Then an executive asks: “Why is this data warehouse costing us so much?”
You’ve built applications, you’ve delivered business value, and you’ve managed your budget. Good for you. But now you have to do more. IT’s definition of data warehouse success is you cutting your budget. Why? Because IT’s definition of success isn’t business value creation, it’s budget conformance.
Because BI isn’t focused on business operation automation, as with many operational systems, it can’t thrive in a maintenance-driven mode. In order to continue to support the business, BI must continually deliver new information and new functionality. Beware the IT organization that wants to migrate the data warehouse to an operational support model measured on budgets, not business value. This can jeopardize more than just your next platform upgrade, it can imperil the BI program itself. The tunnel-vision of Service Level Agreements, manpower estimates, and project plan maintenance aren’t doing you any favors. They can’t be done devoid of business drivers.
When there are new business needs, business users may try to enlist IT resources to support them. But they no longer see partners who will help realize their visions and deliver valuable analytics. They see a few low-cost, less experienced technicians monitoring system uptime and staring at the blinking lights.
photo by jurvetson via Flickr (Creative Commons License)
In the motion picture industry, studios separate responsibilities for creating content from responsibilities for distributing content. The people who make the movies option the scripts, hire the talent, and film the scenes. The distributors of the films, on the other hand, figure out how to package and deploy the films. They need to know which theaters require 30 millimeter versus 70 millimeter formats, or even IMAX. They also deal with DVD packaging, including different international DVD formats. The industry understands the importance of having a supply chain that differentiates between the roles of content creation, content packaging, and distribution.
In IT we’re very quick to point to our operational systems as creators and owners of data. But maybe the solution is that IT establishes a functional team that’s responsible for data packaging and distribution, just like the movie industry.
Traditionally data formats and standards have fallen into the realm of the architecture team. Unfortunately this is typically a paper-only activity without teeth. A data distribution team wouldn’t focus on paperwork. They would be focused on data logistics, receiving content from the various source systems and packaging the data for consumption by other systems. This isn’t about implementing a specific platform to store or move data. It’s about active management of corporate data content.
One of the biggest development challenges is the hunting expedition that developers go on to find and acquire the data they need. Most aren’t aware of all their choices, let alone the optimal systems of record.
Currently every application, data mart, data warehouse, reporting system that needs data from another system follows a specific set of procedures to obtain that data. Each system requests different data formats, different delivery schedules, and different content. Everything is custom, there are few if any standards, and there are no economies of scale.
This will also unburden the various application teams from building and maintaining the never ending volume of custom extract requests. The only way to stop the madness is to compartmentalize content creation from data packaging and distribution. This means establishing a data supply chain that separates data creators from data distribution from consumers. Who knew IT infrastructure was just like the movies?
One of many discussions I heard over Thanksgiving turkey was, “How could the government have let the financial crisis happen?” To which the most frequent response was that regulators were asleep at the wheel. True or not, one could legitimately ask why we have problems with our business intelligence reports. The data is bad and the report is meaningless—who’s asleep at the wheel?
Everyone’s talking about the single version of the truth, but how often are our reports reviewed for accuracy? Several of our financial services clients demand that their BI reports are audited back to the source systems and that numbers are reconciled.
Unfortunately, this isn’t common practice across industries. When we work with new clients we ask about data reconciliation, but most of our new clients don’t have the methods or processes in place. It makes me wonder how engaged business users are in establishing audit and reconciliation rules for their BI capabilities.
No, data perfection isn’t practical. But we should be able to guard against lost data and protect our users from formulas and equations that change. All too often these issues are thrown into the “post development” bucket or relegated to User Acceptance. By then reports aren’t always corrected and data isn’t always fixed.
A robust development process should ensure that data accuracy should be established and measured throughout development. This means that design reviews are necessary before, during, and after development. Design reviews ensure that the data is continually being processed accurately. Many believe that it’s ten or more times more expensive to fix broken code (or data) after development than it is during development. And, as we’ve all seen, often the data doesn’t get fixed at all.
When you’re building a report or delivering data, ask two questions: 1) whether the numbers reflect business expectations, and 2) if they reconcile back to their system of origin. Design review processes should be instituted (or, in many cases, re-instituted) to ensure functional accuracy long before the user every sees the data on her desktop.
A few years ago, a mission to Mars failed because someone forgot to convert U.S. measurement units to metric measurement units. Miles weren’t converted to kilometers.
I thought of this fiasco when reading a blog post recently that insisted that the only reasonable approach for moving data into a data warehouse was to position the data warehouse as the “hub” in a hub-and-spoke architecture. The assumption here is that data is formatted differently on diverse source systems, so the only practical approach is to copy all this data onto the data warehouse, where other systems can retrieve it
I’ve written about this topic in the past, but I wanted to expand a bit. I think it’s time to challenge this paradigm for the sake of BI expediency.
The problem is that the application systems aren’t responsible for sharing their data. Consequently little or no effort is paid to pulling data out of an operational system and making it available to others. This then forces every data consumer to understand the unique data in every system. This is neither efficient nor scale-able.
Moreover, the hub-and-spoke architecture itself is also neither efficient nor scalable. The way manufacturing companies address their distribution challenges is by insisting on standardized components. Thirty-plus years ago, every automobile seemed to have a set of parts that were unique to that automobile. Auto manufacturers soon realized that if they established specifications in which parts could be applied across models, they could reproduce parts, giving them scalability not only across different cars, but across different suppliers.
It’s interesting to me that application systems owners don’t aren’t measured on these two responsibilities:
- Business operation processing—ensuing that business processes are automated and supported effectively
- Supplying data to other systems
No one would argue that the integrated nature of most companies requires data to be shared across multiple systems. That data generated should be standardized: application systems should extract data and package it in a consistent and uniform fashion so that it can be used across many other systems—including the data warehouse—without the consumer struggling to understand the idiosyncrasies of the system it came from.
Application systems should be obligated to establish standard processes whereby their data is availed on a regular basis (weekly, daily, etc.). Since most extracts are column-record oriented, the individual values should be standardized—they should be formatted and named in the same way.
Can you modify every operational system to have a clean, standard extract file on Day 1? Of course not. But as new systems are built, extracts should be built with standard data. For every operational system, a company can save hundreds or even thousands of hours every week in development and processing time. Think of what your BI team could do with the resulting time—and budget money!
photo by jason b42882
It’s rare these days to find clients who haven’t already decided on a standard BI platform. Most of the new BI tool discussions we get into with clients are with companies who’ve decided that it’s time to broaden their horizons beyond Microsoft.
The dirty little secret in most companies is that the BI reporting team has morphed into a de-facto enterprise reporting team. Why is this?
When it comes to reporting, there’s a difference between the BI team and the rest of IT. The fact is that BI teams are successful not because of the infrastructure technologies, but because of the technologies in front of the users: the actual BI tool. To the end user, data visualization and access are much more important than database management and storage infrastructure. So when a new operational system is introduced, users expect the same functionality, look and feel as their other reports.
An insurance company we’re working with is replacing its operational systems. The company’s management has already decided not to use the vendor’s reports—they’re too limited and brittle. They expect these reports to dovetail into the company’s information portal and work alongside their BI reporting. Companies are refreshing their operational platforms every seven to ten years. It’s now 2009, and the last time they refreshed their operational systems was in reaction to Y2K. It’s once again time to revisit those operational systems.
If you look at the challenges BI tool vendors are facing, there is limited growth in data warehousing. Most companies have standardized their BI tool suite. Absent disruptive technology or new functionality, there’s limited growth opportunity for BI tools in the data warehousing space.
But for every data warehouse or data mart within a company, there are likely dozens of operational systems that users need access to. The opportunity for BI vendors now is delivering operational information to business users. This isn’t about complex analytics or advanced computation. This is the retrieval of operational information from where it lives.
Photo by jakeliefer (via Flickr)