Archive | business intelligence (BI) RSS for this section

Standardizing Data Migration

The Motion Picture Industry

In the motion picture industry, studios separate responsibilities for creating content from responsibilities for distributing content. The people who make the movies option the scripts, hire the talent, and film the scenes. The distributors of the films, on the other hand, figure out how to package and deploy the films. They need to know which theaters require 30 millimeter versus 70 millimeter formats, or even IMAX. They also deal with DVD packaging, including different international DVD formats. The industry understands the importance of having a supply chain that differentiates between the roles of content creation, content packaging, and distribution.

In IT we’re very quick to point to our operational systems as creators and owners of data. But maybe the solution is that IT establishes a functional team that’s responsible for data packaging and distribution, just like the movie industry.

Traditionally data formats and standards have fallen into the realm of the architecture team. Unfortunately this is typically a paper-only activity without teeth. A data distribution team wouldn’t focus on paperwork. They would be focused on data logistics, receiving content from the various source systems and packaging the data for consumption by other systems. This isn’t about implementing a specific platform to store or move data. It’s about active management of corporate data content.

One of the biggest development challenges is the hunting expedition that developers go on to find and acquire the data they need. Most aren’t aware of all their choices, let alone the optimal systems of record.

Currently every application, data mart, data warehouse, reporting system that needs data from another system follows a specific set of procedures to obtain that data. Each system requests different data formats, different delivery schedules, and different content. Everything is custom, there are few if any standards, and there are no economies of scale.

This will also unburden the various application teams from building and maintaining the never ending volume of custom extract requests. The only way to stop the madness is to compartmentalize content creation from data packaging and distribution. This means establishing a data supply chain that separates data creators from data distribution from consumers. Who knew IT infrastructure was just like the movies?

BI Reports, Data Quality, and the Dreaded Design Review

Business Man Asleep at Desk (Image courtesy

One of many discussions I heard over Thanksgiving turkey was, “How could the government have let the financial crisis happen?” To which the most frequent response was that regulators were asleep at the wheel. True or not, one could legitimately ask why we have problems with our business intelligence reports. The data is bad and the report is meaningless—who’s asleep at the wheel?

Everyone’s talking about the single version of the truth, but how often are our reports reviewed for accuracy? Several of our financial services clients demand that their BI reports are audited back to the source systems and that numbers are reconciled.

Unfortunately, this isn’t common practice across industries. When we work with new clients we ask about data reconciliation, but most of our new clients don’t have the methods or processes in place. It makes me wonder how engaged business users are in establishing audit and reconciliation rules for their BI capabilities. 

No, data perfection isn’t practical. But we should be able to guard against lost data and protect our users from formulas and equations that change. All too often these issues are thrown into the “post development” bucket or relegated to User Acceptance. By then reports aren’t always corrected and data isn’t always fixed.

A robust development process should ensure that data accuracy should be established and measured throughout development. This means that design reviews are necessary before, during, and after development. Design reviews ensure that the data is continually being processed accurately. Many believe that it’s ten or more times more expensive to fix broken code (or data) after development than it is during development. And, as we’ve all seen, often the data doesn’t get fixed at all.

When you’re building a report or delivering data, ask two questions: 1) whether the numbers reflect business expectations, and 2) if they reconcile back to their system of origin. Design review processes should be instituted (or, in many cases, re-instituted) to ensure functional accuracy long before the user every sees the data on her desktop.

Improving BI Development Efficiency: Standard Data Extracts

Mars by jason42882

A few years ago, a mission to Mars failed because someone forgot to convert U.S. measurement units to metric measurement units. Miles weren’t converted to kilometers.

I thought of this fiasco when reading a blog post recently that insisted that the only reasonable approach for moving data into a data warehouse was to position the data warehouse as the “hub” in a hub-and-spoke architecture. The assumption here is that data is formatted differently on diverse source systems, so the only practical approach is to copy all this data onto the data warehouse, where other systems can retrieve it

I’ve written about this topic in the past, but I wanted to expand a bit. I think it’s time to challenge this paradigm for the sake of BI expediency.

The problem is that the application systems aren’t responsible for sharing their data. Consequently little or no effort is paid to pulling data out of an operational system and making it available to others. This then forces every data consumer to understand the unique data in every system. This is neither efficient nor scale-able.

Moreover, the hub-and-spoke architecture itself is also neither efficient nor scalable. The way manufacturing companies address their distribution challenges is by insisting on standardized components. Thirty-plus years ago, every automobile seemed to have a set of parts that were unique to that automobile. Auto manufacturers soon realized that if they established specifications in which parts could be applied across models, they could reproduce parts, giving them scalability not only across different cars, but across different suppliers. 

It’s interesting to me that application systems owners don’t aren’t measured on these two responsibilities:

  • Business operation processing—ensuing that business processes are automated and supported effectively
  • Supplying data to other systems

No one would argue that the integrated nature of most companies requires data to be shared across multiple systems. That data generated should be standardized: application systems should extract data and package it in a consistent and uniform fashion so that it can be used across many other systems—including the data warehouse—without the consumer struggling to understand the idiosyncrasies of the system it came from.

Application systems should be obligated to establish standard processes whereby their data is availed on a regular basis (weekly, daily, etc.). Since most extracts are column-record oriented, the individual values should be standardized—they should be formatted and named in the same way.

Can you modify every operational system to have a clean, standard extract file on Day 1? Of course not. But as new systems are built, extracts should be built with standard data. For every operational system, a company can save hundreds or even thousands of hours every week in development and processing time. Think of what your BI team could do with the resulting time—and budget money!

photo by jason b42882

No Data Warehouse Required: BI Reporting Extends Its Reach

Rusted Warehouse by jakeliefer via Flickr

It’s rare these days to find clients who haven’t already decided on a standard BI platform. Most of the new BI tool discussions we get into with clients are with companies who’ve decided that it’s time to broaden their horizons beyond Microsoft.

The dirty little secret in most companies is that the BI reporting team has morphed into a de-facto enterprise reporting team. Why is this?

When it comes to reporting, there’s a difference between the BI team and the rest of IT. The fact is that BI teams are successful not because of the infrastructure technologies, but because of the technologies in front of the users: the actual BI tool. To the end user, data visualization and access are much more important than database management and storage infrastructure.  So when a new operational system is introduced, users expect the same functionality, look and feel as their other reports.

An insurance company we’re working with is replacing its operational systems. The company’s management has already decided not to use the vendor’s reports—they’re too limited and brittle. They expect these reports to dovetail into the company’s information portal and work alongside their BI reporting. Companies are refreshing their operational platforms every seven to ten years. It’s now 2009, and the last time they refreshed their operational systems was in reaction to Y2K. It’s once again time to revisit those operational systems.

If you look at the challenges BI tool vendors are facing, there is limited growth in data warehousing. Most companies have standardized their BI tool suite. Absent disruptive technology or new functionality, there’s limited growth opportunity for BI tools in the data warehousing space.

But for every data warehouse or data mart within a company, there are likely dozens of operational systems that users need access to. The opportunity for BI vendors now is delivering operational information to business users. This isn’t about complex analytics or advanced computation. This is the retrieval of operational information from where it lives.

Photo by jakeliefer (via Flickr)

Blurring the Line Between SOA and BI


photo by Siomuzzz

I recently read with interest an article in the Microsoft Architect Journal on so-called Service-Oriented Business Intelligence or, as the article’s authors call it, “SoBI.” The article was well-intentioned but confusing. What it confirmed to me is that plenty of experienced IT professionals are struggling to reconcile Service Oriented Architecture (SOA) concepts with business intelligence.

SOA is certainly a valuable tool in the architecture and development toolbox; however, I think it’s only fair to keep SOA in perspective.  It’s an evolutionary  technology  in IT that has numerous benefits to developer productivity and application connectivity.  I’m not sure that injecting SOA into a data warehouse environment or framework will do anything more than freshen a few low-level building blocks that have been neglected in some data warehouse environments.  I’m certainly not challenging the value of SOA; I’m just trying to put in perspective to those folks that are focused on data warehouse and business intelligence activities.

The idea around SOA is to create services (or functions, procedures, etc.) that can be used by other systems.  The idea is simple: build once, use many times.  This ensures that important (and possibly complicated) application processes can be used by numerous disparate applications. It’s like an application processing supply chain:  let the most efficient resource build a service and provide to everyone else for use.   SOA provides a framework for allowing multiple applications access to common, well-defined services.  These services can contain code and/or data.  

The question for most data warehouse environment’s isn’t whether SOA can improve (or benefit) the data warehouse; it’s understanding how SOA can benefit a data warehouse. 

We’ve got lots of clients leveraging SOA to support their data warehouse.  They’ve learned they can leverage SOA techniques and coding to deliver standardized data cleansing and data validation to a range of business applications.  They have also upgraded the operational system data extraction code to leverage SOA which allowed other application systems (or data marts) to reuse their code.

However, their use of the SOA hasn’t been focused on enhancing the data warehouse environment as much as has been focused on packaging their development efforts for others to use.  Most data warehouse developers invest heavily in navigating IT’s labyrinth of operational systems and application data in order to identify, cleanse, and load data into their warehouses.  What they’ve learned is that for every new ETL script, there are probably 20 other systems that have to custom developed their own data retrieval code and never documented it.  The value that many data warehouse developers find with SOA isn’t that they are improving their data warehouse;  they’re just addressing the limitations of the application systems.

Operational BI From the Trenches

By Evan Levy

Buzzword_box2 Operational BI is getting a lot of attention.  The idea is a reasonable one – using recent data to make timely decisions.  However as with any other current buzzword, the world seems to be piling on and the meaning of operational BI seems to be is evolving (or eroding).

BI has been around a while now.  The idea is to leverage technology to allow a business person to utilize detailed data to answer timely business questions.  The most well known BI tools come from established vendors: IBM, Microsoft, Business Objects, Microstrategy.  Most tools use relational databases and rely on the SQL language to navigate and manipulate the data.   Most data warehouses that provide data to BI tools have been built to support query flexibility, performance, and maintain a large volume of history data.  The trade-off is often that there are delays in getting data loaded.  Most high-value data warehouses rely on regular monthly, weekly, or daily updates.   They were never built to support “operational” functionality.

The fuzzy part is what we mean by "operational."  Rather than engaging in a semantic debate, I thought I'd share what we see at clients as the three common requirements where for truly operational BI:

  1. Load the data fast – usually right after it's created.
  2. Run a query fast. For instance, look up the customer’s billing history while he's waiting on the phone.
  3. Identify a specific business circumstance when it happens. For instance, tell the customer when she's exhausted her cell phone minutes.

As you can imagine, any one of these individual capabilities is likely to require specialized development work .  When you combine these functions, it becomes pretty clear that traditional data warehouses or business intelligence tools  can struggle to support Operational BI.  When a legitimate need for Operational BI arises, most IT departments simply build a separate reporting data mart or a reporting platform.  Why? Because the timeliness of loading and query processing makes it impractical to add on to an existing platform—unless of course they happen to have  a large-scale data warehouse with unused processing capacity just laying around.

The truth is, you may not need to limit your operational BI solution to relational database, or even to a BI tool! (I made this point on a recent broadcast of DM Radio and it invited a lot of post-show dialog.) The fact is that that relational databases and SQL aren't the best (or even the most efficient) technologies to support operational BI.   Indeed, there are other technologies that can support some of the Operational BI activities in a simpler and more efficient manner. We'll talk about those in another blog posting, after you've had a chance to consider this one.

Why BI Development is Different

By Evan Levy

When companies initially embark on their BI development initiatives, they often underestimate its complexity. Some begin BI in the first place because their packaged applications don’t deliver the reporting functionality they need. Others embark on BI because the data they need to analyze is located in multiple, disparate application systems. While positioning a data warehouse to integrate and store historical data from packaged applications, like ERP or CRM, is a reasonable and proven approach, many companies try to repurpose the development methods associated with these packages to deliver BI.

But comparing development methods and skill sets for these two divergent types of systems is like comparing picking apples to making a fruit salad. The fact is the methodology for building a data warehouse is very similar to traditional code development using lower-level programming languages. To be successful building a data warehouse, a team should have skills in business requirements gathering, functional requirements definition, specification and design, data modeling, database design, as well as all the skills associated with loading the data and coding the application. This is clearly a complex mix of technical knowledge to deliver a business solution spanning everything from storage allocation to workload management to systems integration to application programming. The fact is you’re building something from scratch.

The packaged application world is complex in its own right, but it’s also very different, as are the skills and methodologies involved in building these environments. Most IT organizations accustomed to implementing packages use third-party firms to install and configure these systems. Their staff members don’t have the necessary skills to build these solutions, and often require training and multiple years of hands-on use to be proficient in supporting these systems. In addition, most organizations forget that implementing their business applications typically takes a year or longer.

When was the last time you were allowed a full year to implement your data warehouse? And was your team even half the size of the packaged app’s development team?

Underestimating the Project Managers

By Evan Levy

One of the most misunderstood roles on a BI team is the Project Manager. All too often the role is defined as an administrative set of activities focused on writing and maintaining the project plan, tracking the budget, and monitoring task completion. Unfortunately IT management rarely understands the importance of domain knowledge—having BI experience—and leadership skills.

To assign a BI project manager who has no prior BI experience is an accident waiting to happen. Think about a homeowner who decides to build a new house. He retains a construction company and the foreman has never built a house before. You’d want fundamental knowledge of demolition, framing, plumbing, wiring, and so on. The foreman would need to understand that the work was being done in the right way.

Unfortunately IT managers think they can position certified project managers on BI teams without any knowledge of BI-specific development processes, business decision-making, data content, or technology. We often find ourselves coaching these project managers on the differences in BI development, or introducing concepts like staging areas or federated queries. This is time that could be better spent transferring knowledge and formalizing development processes with a more seasoned project lead.

In order for a project team to be successful, the project manager should have strong leadership skills. The ability to communicate a common goal and ensure focus is both art and science. But BI project managers often behave more like bureaucrats, requesting task completion percentages and reviewing labor hours. They are rarely invested in whether the project is adhering to development standards, if permanent staff is preparing to take ownership of the code, or whether the developers are collaborating.

An effective BI project manager should be a project leader. He or she should understand that the definition of success is not a completed project plan or budget spreadsheet, but rather that the project delivers usable data and fulfills requirements. The BI project manager should instill the belief that success doesn’t mean task completion, but delivery against business goals.

BI Business Requirements: When Perfect is the Enemy of Good

By Evan Levy

Sometimes we find clients who overestimate their need for analytics. Often, IT is focused on using BI to analyze a problem exhaustively, when sometimes exhaustive analysis just isn’t necessary. Sometimes our analytics requirements just aren’t that sophisticated.

Twenty years ago, WalMart knew when it needed to pull a product from the shelf. This didn’t require advanced analytics to drill down on the category, affinities, the seasonality, or the purchaser. It was simple: if the product didn’t sell after six days, free up the shelf space and move on. After all, there were other products to sell.

Why does this matter? Because we get so wrapped up in new, more sophisticated technologies that we forget about our requirements. Sometimes we just need to know what the problem and resulting action is. We don’t necessarily need to know the "why" every time. Often, all business users want is the information that’s good enough to support the decision they need to make.

%d bloggers like this: