One of the challenges in delivering successful data-centric projects (e.g. analytics, BI, or reporting) is realizing that the definition of project success differs from traditional IT application projects. Success for a traditional application (or operational) project is often described in terms of transaction volumes, functional capabilities, processing conformance, and response time; data project success is often described in terms of business process analysis, decision enablement, or business situation measurement. To a business user, the success of a data-centric project is simple: data usability.
It seems that most folks respond to data usability issues by gravitating towards a discussion about data accuracy or data quality; I actually think the more appropriate discussion is data knowledge. I don’t think anyone would argue that to make data-enabled decisions, you need to have knowledge about the underlying data. The challenge is understanding what level of knowledge is necessary. If you ask a BI or Data Warehouse person, their answer almost always includes metadata, data lineage, and a data dictionary. If you ask a data mining person, they often just want specific attributes and their descriptions — they don’t care about anything else. All of these folks have different views of data usability and varying levels (and needs) for data knowledge.
One way to improve data usability is to target and differentiate the user audience based on their data knowledge needs. There are certainly lots of different approaches to categorizing users; in fact, every analyst firm and vendor has their own model to describe different audience segments. One of the problems with these types of models is that they tend to focus heavily on the tools or analytical methods (canned reports, drill down, etc.) and ignore the details of data content and complexity. The knowledge required to manipulate a single subject area (revenue or customer or usage) is significantly less than the skills required to manipulate data across 3 subject areas (revenue, customer, and usage). And what exacerbates data knowledge growth is the inevitable plethora of value gaps, inaccuracies, and inconsistencies associated with the data. Data knowledge isn’t just limited to understanding the data; it includes understanding how to work around all of the imperfections.
Here’s a model that categories and describes business users based on their views of data usability and their data knowledge needs
Level 1: “Can you explain these numbers to me?”
This person is the casual data user. They have access to a zillion reports that have been identified by their predecessors and they focus their effort on acting on the numbers they get. They’re not a data analyst – their focus is to understand the meaning of the details so they can do their job. They assume that the data has been checked, rechecked, and vetted by lots of folks in advance of their receiving the content. They believe the numbers and they act on what they see.
Level 2: “Give me the details”
This person has been using canned reports, understands all the basic details, and has graduated to using data to answer new questions that weren’t identified by their predecessors. They need detailed data and they want to reorganize the details to suit their specific needs (“I don’t want weekly revenue breakdowns – I want to compare weekday revenue to weekend revenue”). They realize the data is imperfect (and in most instances, they’ll live with it). They want the detail.
Level 3: “I don’t believe the data — please fix it”
These folks know their area of the business inside/out and they know the data. They scour and review the details to diagnose the business problems they’re analyzing. And when they find a data mistake or inaccuracy, they aren’t shy about raising their hand. Whether they’re a data analyst that uses SQL or a statistician with their favorite advanced analytics algorithms, they focus on identifying business anomalies. These folks are the power users that are incredibly valuable and often the most difficult for IT to please.
Level 4: “Give me more data”
This is subject area graduation. At this point, the user has become self-sufficient with their data and needs more content to address a new or more complex set of business analysis needs. Asking for more data – whether a new source or more detail – indicates that the person has exhausted their options in using the data they have available. When someone has the capacity to learn a new subject area or take on more detailed content, they’re illustrating a higher level of data knowledge.
One thing to consider about the above model is that a user will have varying data knowledge based on the individual subject area. A marketing person may be completely self-sufficient on revenue data but be a newbie with usage details. A customer support person may be an expert on customer data but only have limited knowledge of product data. You wouldn’t expect many folks (outside of IT) to be experts on all of the existing data subject areas. Their knowledge is going to reflect the breadth of their job responsibilities.
As someone grows and evolves in business expertise and influence, it’s only natural that their business information needs would grow and evolve too. In order to address data usability (and project success), maybe it makes sense to reconsider the various user audience categories and how they are defined. Growing data knowledge isn’t about making everyone data gurus; it’s about enabling staff members to become self-sufficient in their use of corporate data to do their jobs.
Photo “Ladder of Knowledge” courtesy of degreezero2000 via Flickr (Creative Commons license).
Companies spend a small fortune continually investing and reinvesting in making their business analysts self-sufficient with the latest and greatest analytical tools. Most companies have multiple project teams focused on delivering tools to simplify and improve business decision making. There are likely several standard tools deployed to support the various data analysis functions required across the enterprise: canned/batch reports, desktop ad hoc data analysis, and advanced analytics. There’s never a shortage of new and improved tools that guarantee simplified data exploration, quick response time, and greater data visualization options, Projects inevitably include the creation of dozens of prebuilt screens along with a training workshop to ensure that the users understand all of the new whiz bang features associated with the latest analytic tool incarnation. Unfortunately, the biggest challenge within any project isn’t getting users to master the various analytical functions; it’s ensuring the users understand the underlying data they’re analyzing.
If you take a look at the most prevalent issue with the adoption of a new business analysis tool is the users’ knowledge of the underlying data. This issue becomes visible with a number of common problems: the misuse of report data, the misunderstanding of business terminology, and/or the exaggeration of inaccurate data. Once the credibility or usability of the data comes under scrutiny, the project typically goes into “red alert” and requires immediate attention. If ignored, the business tool quickly becomes shelfware because no one is willing to take a chance on making business decisions based on risky information.
All too often the focus on end user training is tool training, not data training. What typically happens is that an analyst is introduced to the company’s standard analytics tool through a “drink from a fire hose” training workshop. All of the examples use generic sales or HR data to illustrate the tool’s strengths in folding, spindling, and manipulating the data. And this is where the problem begins: the vendor’s workshop data is perfect. There’s no missing or inaccurate data and all of the data is clearly labeled and defined; classes run smoothly, but it just isn’t reality Somehow the person with no hands-on data experience is supposed to figure out how to use their own (imperfect) data. It’s like someone taking their first ski lesson on a cleanly groomed beginner hill and then taking them up to the top of an a black diamond (advanced) run with step hills and moguls. The person works hard but isn’t equipped to deal with the challenges of the real world. So, they give up on the tool and tell others that the solution isn’t usable.
All of the advanced tools and manipulation capabilities don’t do any good if the users don’t understand the data. There are lots of approaches to educating users on data. Some prefer to take a bottom-up approach (reviewing individual table and column names, meanings, and values) while others want to take a top-down approach (reviewing subject area details, the associated reports, and then getting into the data details). There are certainly benefits of one approach over the other (depending on your audience); however, it’s important not to lose sight of the ultimate goal: giving the users the fundamental data knowledge they need to make decisions. The fundamentals that most users need to understand their data include a review of
- the business subject area associated with their dat
- business terms, definitions, and their associated data attributes
- data values and their representations
- business rules and calculations associated with the individual values
- the data’s origin (a summary of the business processes and source system)
The above details may seem a bit overwhelming if you consider that most companies have mature reporting environments and multi-terabyte data warehouses. However, we’re not talking about training someone to be an expert on 1000 data attributes contained within your data warehouse; we’re talking about ensuring someone’s ability to use an initial set of reports or a new tool without requiring 1-on-1 training. It’s important to realize that the folks with the greatest need for support and data knowledge are the newbies, not the experienced folks.
There are lots of options for imparting data knowledge to business users: a hands-on data workshop, a set of screen videos showing data usage examples, or a simple set of web pages containing definitions, textual descriptions, and screen shots. Don’t get wrapped up in the complexities of creating the perfect solution – keep it simple. I worked with a client that deployed their information using a set of pages constructed with PowerPoint that folks could reference in a the company’s intranet. If your users have nothing – don’t’ worry about the perfect solution – give them something to start with that’s easy to use.
Remember that the goal is to build users’ data knowledge that is sufficient to get them to adopt and use the company’s analysis tools. We’re not attempting to convert everyone into data scientists; we just want them to use the tools without requiring 1-on-1 training to explain every report or data element.
Photo courtesy of NASA. Nasa Ames Research Center engineer H Julian “Harvey” Allen illustrating data knowledge (relating to capsule design for the Mercury program)
Maybe it’s time to challenge the 20 year-old paradigm of making everyone a knowledge worker. For a long time the BI community has assumed that if we give business users the right data and tools, they’ll have the necessary ammunition to do their jobs. But I’m beginning to believe that may no longer be a practical approach. At least not for everyone.
One thing that’s changed in the last dozen-or-so years is that individuals’ job responsibilities have become more complex. The breadth of these responsibilities has grown. I question whether the average business user can really keep track of all the subject area content, all the table definitions, column names, data types, definitions of columns, and locations of all the values across the 6000+ tables in the data mart.
And that’s just the data mart. I’m not even including the applications and systems the average business user interacts with on a daily basis. Not to mention all those presentations, documents, videos, and archived e-mails from customers.
I’m not arguing the value of analytics, nor am I challenging the value of the data warehouse. But is it really practical to expect everyone to generate their own reports? Look at the U.S. tax code. It’s certainly broader than a single CPA can keep track of. Now consider most companies’ Finance departments. There’s more data coming out of Finance than most people can deal with. Otherwise all those specialized applications and dedicated data analysts wouldn’t exist in the first place!
Maybe it’s not about delivering BI tools to every end-user. Maybe it’s about delivering reports in a manner that can be consumed. We’ve gotten so wound-up about detailed data that we haven’t stopped to wonder whether it’s worthwhile to push all that detail to the end-user’s desktop—and then expect him or her to learn all the rules.
One of my brokerage accounts contains 5 different equities. I don’t look at them every day. I don’t look at intra-day price changes. I really don’t need to know. All I really want to know is when I do look at the information, has the stock’s value gone up or down? And how do I get the information? I didn’t build a custom report. I didn’t do drill-down, or drill-across. I went to the web and searched on the stock price.
Maybe instead of buying of a copy of a [name the BI vendor software] tool, we simply build a set of standard reports for key business areas (Sales, Marketing, Finance), and publish them. You can publish these reports to a drive, to a server, to a website, to a portal—it shouldn’t matter. People should find the information with a browser. Reports can be stored and indexed and accessed via an enterprise search engine. Of course, as with everything else, you still need to define terms and metadata so that people understand what they’re reading.
Whenever people talk about enterprise search functionality they’re usually obsessing about unstructured data. But enterprise search can deliver enormous value for structured data. IT departments could be leading the charge if the definition of success weren’t large infrastructure and technology implementation projects and instead data delivery and usage.
The executive doesn’t ask, “What tool did you use to solve this problem?” Instead, she wants to know if the problem has in fact been solved.
There are far too many data warehouse development teams solely focused on loading data. They’ve completely lost sight of their success metrics.
Why have they fallen into this rut? Because they’re doing what they’ve always done. One of the challenges in data warehousing is that as time progresses the people on the data warehouse development team are often not the same people who launched the team. This erosion of experience has eroded the original vision and degraded the team’s effectiveness .
One client of mine actually bonused their data warehouse development team based on system usage and capacity. Was there a lot of data in the data warehouse? Yep. Were there multiple sandboxes, each with its own copy of data? Yep. Was this useful three years ago? Yep. Does any of this matter now? Nope. The original purpose of the data warehouse—indeed, the entire BI program—has been forgotten.
In the beginning the team understood how to help the business. They were measured on business impact. Success was based on new revenues and lower costs outside of IT. The team understood the evolution of the applications and data to support BI was critical in continually delivering business value. There was an awareness of what was next. Success was based on responding to the new business need. Sometimes this meant reusing data with new reports, sometimes it meant new data, sometimes it was just adjusting a report. The focus was on aggressively responding to business change and the resulting business need.
How does your BI team support decision making? Does it still deliver value to business users? Maybe your company is like some of the companies that I’ve seen: the success of the data warehouse and the growth of its budget propelled it into being managed like an operational system. People have refocused their priorities to managing loads, monitoring jobs, and throwing the least-expensive, commodity skills at the program. So a few years after BI is introduced, the entire program has become populated with IT order-takers, watching and managing extracts, load jobs, and utilization levels.
Then an executive asks: “Why is this data warehouse costing us so much?”
You’ve built applications, you’ve delivered business value, and you’ve managed your budget. Good for you. But now you have to do more. IT’s definition of data warehouse success is you cutting your budget. Why? Because IT’s definition of success isn’t business value creation, it’s budget conformance.
Because BI isn’t focused on business operation automation, as with many operational systems, it can’t thrive in a maintenance-driven mode. In order to continue to support the business, BI must continually deliver new information and new functionality. Beware the IT organization that wants to migrate the data warehouse to an operational support model measured on budgets, not business value. This can jeopardize more than just your next platform upgrade, it can imperil the BI program itself. The tunnel-vision of Service Level Agreements, manpower estimates, and project plan maintenance aren’t doing you any favors. They can’t be done devoid of business drivers.
When there are new business needs, business users may try to enlist IT resources to support them. But they no longer see partners who will help realize their visions and deliver valuable analytics. They see a few low-cost, less experienced technicians monitoring system uptime and staring at the blinking lights.
photo by jurvetson via Flickr (Creative Commons License)
At a client meeting recently I was informed that the company had over 150 business analysts. Even the client’s business executives acknowledged this to be true. But that many business analysts suggests that these people are spending their time on efforts outside the realm of business analysis.
Most IT organizations model the business analyst (BA) role around transactional or operational systems. Whether organizations buy packages or build code from scratch, business analysis skills are usually focused on system and process analysis. The majority of data issues within an operational system are data entry-related. The challenge in business analysis is to establish standard business processes to automate.
Few operational systems are built with the goal of data share-ability. So issues such as standard business values and definitions don’t come up. When a new ERP system is being implemented, little attention is paid to the customer id number or the customer’s name. It’s fairly common for operational systems to be built in an isolated way to support a well-known business process with no attention to data standards.
But it’s different in the BI and analytics environments. To be successful in BI, it’s critical to have integrated data from individual systems to support often-complex analytics. The BA doesn’t need to focus on the business processes that created the data—rather he or she should focus on the business scenarios that mandate accurate and meaningful data. The expertise needed is around data content analysis and understanding data from different source systems and what it represents.
To be an effective BA for business intelligence or master data management it’s critical to understand how different systems see data, even data that’s ostensibly the same. In a telecommunications firm, System A was account-based and System B was customer-based. Customer details existed in both systems. A good BA understands which data is critical from each system and what the rules are for matching records across them.
The BA needs to:
- Be able to identify different business scenarios for how data will be used
- Interested and willing to go through often-tedious analysis of content, formatting, and definitional differences in data within and across systems
- Be comfortable with the tools necessary to dig into the data and profile it
- Excel at communicating data requirements and anomalies in business language
At the end of the day a good business analyst should understand that data should be independent of applications and reflective of good business terminology. Until then, we’ll have hundreds of them and they’re likely to be forklifting data from one system to another in an on-demand, non-repeatable, non-scalable, and inefficient way.
By Evan Levy
Sometimes we find clients who overestimate their need for analytics. Often, IT is focused on using BI to analyze a problem exhaustively, when sometimes exhaustive analysis just isn’t necessary. Sometimes our analytics requirements just aren’t that sophisticated.
Twenty years ago, WalMart knew when it needed to pull a product from the shelf. This didn’t require advanced analytics to drill down on the category, affinities, the seasonality, or the purchaser. It was simple: if the product didn’t sell after six days, free up the shelf space and move on. After all, there were other products to sell.
Why does this matter? Because we get so wrapped up in new, more sophisticated technologies that we forget about our requirements. Sometimes we just need to know what the problem and resulting action is. We don’t necessarily need to know the "why" every time. Often, all business users want is the information that’s good enough to support the decision they need to make.