Like many of you, I’m a big believer that data is a valuable business asset. Most business leaders understand the value of data and are prepared to make decisions, adjust their direction, or consider new ideas if the data exists to support the idea. However, while most folks agree that data is valuable, few have really changed their company’s culture or behavior when it comes to treating data as an asset.
The reality is that most corporate data is not treated as an asset. In fact, most company’s data management practices are rooted in methods and practices that are more than 30 years old. Treating data as a business asset is more than investing in storage and data transformation tools. Treating data like a valuable asset means managing, fixing, maintaining content to ensure it’s ready and reliable to support business activities. If you disagree, let’s take a look at how companies treat other valuable business assets.
Consider a well understood asset that exists within numerous companies: the automobile fleet. Companies that invest in automobile fleets do so because the productivity of their team members depends on having this reliable business tool. Automobile fleets exist because staff members require reliable transportation to fulfill their job responsibilities.
The company identifies and tracks the physical cars. They assign cars to individuals, and there’s a slew of rules and responsibilities associated with their use. Preventative maintenance and repairs are handled regularly to maintain the car’s value, reliability and readiness for use. Depending on the size of the fleet, the company may have staff members (equipped with the necessary tools) to handle the ongoing maintenance. The cars are also inspected on a regular basis to ensure that any problems are identified and resolved (again, to maintain its useful life and reliability). There is also criteria for disposing of cars at their end-of-life (which is predetermined based on when the costs and liabilities exceed their value). These activities aren’t discretionary, they are necessary to protect the company’s investment in their valuable business assets.
Now, consider applying the same set of concepts to your company’s data assets.
- Is someone responsible for tracking the data assets? (Is there a list of data sources? Are they updated/maintained? Is the list published?)
- Are the responsibilities and rules for data usage identified and documented? (Does this occur for all data assets, or is it specific to individual platforms?)
- Is there a team that is responsible for monitoring and inspecting data for problems? (Are they equipped with the necessary tools to accomplish such a task?)
- Is there anyone responsible for maintaining and/or fixing inaccurate data?
- Are there details reflecting the end-of-life criteria for your data assets when the liability and costs of the data exceed their value?
If you answered no to any these questions, it’s likely that your company views data as a tool or a commodity, but not a valuable business asset.
So, what do you do?
I certainly wouldn’t grab this list and run around the office claiming that the company isn’t treating data as an asset. Nor, would I suggest that you state that your company likely spends more money maintaining their automobile fleet then its business data. (I once accused a company of spending more on landscaping than data management. It wasn’t well received).
Instead, raise the idea of data investment as a means to increase the value and usefulness of data within the company. Conduct an informal survey to a handful of business users and ask them the time they lose looking for their data. Ask your ETL developers to estimate the time they spend fixing broken data, instead of their core job responsibilities. You’ll find the staff time lost because data isn’t managed and maintained as a business asset vastly exceeds the investment in preventative maintenance, tools, and repairs. You have to educate people about a problem before you can expect them to act to resolve the problem.
And if all else fails, find out how much your company spends on its automobile fleet (per user) and compare it to the non-existent resources spent maintaining and fixing your company’s other valuable business asset.
This blog is the final installment in a series focused on reviewing the individual Components of a Data Strategy. This edition discusses the component Govern and the details associated with supporting a Data Governance initiative as part of an overall Data Strategy.
The definition of Govern is:
“Establishing, communicating and monitoring information practices to ensure effective data sharing, usage, and protection”
As you’re likely aware, Data Governance is about establishing (and following) policies, rules, and all of the associated rigor necessary to ensure that data is usable, sharable, and that all of the associated business and legal details are respected. Data Governance exists because data sharing and usage is necessary for decision making. And, the reason that Data Governance is necessary is because the data is often being used for a purpose outside of why it was collected.
I’ve identified 5 facets about Data Governance to consider when developing your Data Strategy. As a reminder (from the initial Data Strategy Component blog), each facet should be considered individually. And because your Data Strategy goals will focus on future aspirational goals as well as current needs, you’ll likely want to consider different options for each. Each facet can target a small organization’s issues or expand to focus on a large company’s diverse needs.
Information policies are high level information-oriented objectives that your company (or organization, or “governing body”) identify. Information policies act as boundaries or guard rails to guide all of the detailed (and often tactical) rules to identify required and acceptable data-oriented behavior. To offer context, some examples of the information policies that I’ve seen include
- “All customer data will be protected from unauthorized use”.
- “User data access should be limited to ‘systems of record’(when available)”.
- “All data shipped into and out of the company must be processed by the IT Data Onboarding team”.
It’s very common for Data Governance initiatives to begin with focusing on formalizing and communicating a company’s information policies.
Business Data Rules
Rules are specific lower-level details that explain what a data user (or developer) is and isn’t allowed to do. Business data rules (also referred to as “business rules”) can be categorized into one of four types:
- These are the “things” that represent the business details that we measure, track, and analyze. (e.g. a customer, a purchase, a product).
- The details that describe the terms and related details about a business (e.g. The customer purchases a product, Products are sold at a store location).
- These are the details associated with the various items and actions within a company (e.g. The company can only sell a product that is in inventory).
- The distillation or generation of new rules based on other rules. (e.g. Rule: A product can be purchased or returned by a customer. Derivation: A product cannot be returned unless it was purchased from the company).
While the implementation of rules is often the domain of a data administration (or a logical data modeling) team, data governance is often responsible for establishing and managing the process for introducing, communicating, and updating rules.
The term quality is often referred to as “conformance to requirements”. Data Acceptance is a similar concept: the details (or rules) and process applied against data to ensure it is suitable for the use intended. The premise of data acceptance is identifying the minimum details necessary to ensure that data can be used or processed support the associated business activities. Some examples of data acceptance criteria include
- All data values must be non-null.
- All fields within a record must reflect a value within a defined range of values for that field (or business term).
- The product’s price must be a numeric value that is non-zero and non-negative.
- All addresses must be valid mailable addresses.
In order to correct, standardize, or cleanse data, data acceptance for a specific business value (or term) must be identified.
A Data Governance Mechanism is the method (or process) to identify a new rule, process, or detail to support Data Governance. The components of a mechanisms may include the process definition (or flow), the actors, and their decision rights.
This is an area where many Data Governance initiatives fail. While most Governance teams are very good in building new policies, rules, processes, and the associated rigor, they often forget to establish the mechanisms to allow all of the Governance details to be managed, maintained, and updated. This is critically important because as an organization evolves and matures with Data Governance, it may outgrow many of the initial rules and practices. Establishing a set of mechanisms to support modifying and updating existing rules and practices is important to supporting the growth and evolution of a Data Governance environment
The strength and success of Data Governance shouldn’t be measured by the quantity of rules or policies. The success of Data Governance is reflected by the adoption of the rules and processes that are established. Consequently, it’s important for the Data Governance team to continually measure and report adoption levels to ensure the Data Governance details are applied and followed. And where they challenges in adoption, mechanisms exist to allow stakeholders to adjust and update the various aspects of Data Governance to support the needs of the business and the users.
Data Governance will always be a polarizing concept. Whether introduced as part of a development methodology, included within a new data initiative, required to address a business compliance need, or positioned within a Data Strategy, Data Governance is always going to ruffle feathers.
Because folks are busy and they don’t want to be told that they need to have their work reviewed, modified, or approved. Data Governance is an approach (and arguably a method, practice, and process) to ensure that data usage and sharing aligns with policy, business rules, and the law. Data Governance is the “rules of the road” for data.
This blog is 4th in a series focused on reviewing the individual Components of a Data Strategy. This edition discusses the component Assemble and the numerous details involved with sourcing, cleansing, standardizing, preparing, integrating, and moving the data to make it ready to use.
The definition of Assemble is:
“Cleansing, standardizing, combining, and moving data residing in multiple locations and producing a unified view”
In the Data Strategy context, Assemble includes all of the activities required to transform data from its host-oriented application context to one that is “ready to use” and understandable by other systems, applications, and users.
Most data used within our companies is generated from the applications that run the company (point-of-sale, inventory management, HR systems, accounting) . While these applications generate lots of data, their focus is on executing specific business functions; they don’t exist to provide data to other systems. Consequently, the data that is generated is “raw” in form; the data reflects the specific aspects of the application (or system of origin). This often means that the data hasn’t been standardized, cleansed, or even checked for accuracy. Assemble is all of the work necessary to convert data from a “raw” state to one that is ready for business usage.
I’ve identified 5 facets to consider when developing your Data Strategy that are commonly employed to make data “ready to use”. As a reminder (from the initial Data Strategy Component blog), each facet should be considered individually. And because your Data Strategy goals will focus on future aspirational goals as well as current needs, you’ll likely want to consider different options for each. Each facet can target a small organization’s issues or expand to focus on a large company’s diverse needs.
Identification and Matching
Data integration is one of the most prevalent data activities occurring within a company; it’s a basic activity employed by developers and users alike. In order to integrate data from multiple sources, it’s necessary to determine the identification values (or keys) from each source (e.g. the employee id in an employee list, the part number in a parts list). The idea of matching is aligning data from different sources with the same identification values. While numeric values are easy to identify and match (using the “=” operator), character-based values can be more complex (due to spelling irregularities, synonyms, and mistakes).
Even though it’s highly tactical, Identification and matching is important to consider within a Data Strategy to ensure that data integration is processed consistently. And one of the (main) reasons that data variances continue to exist within companies (despite their investments in platforms, tools, and repositories) is because the need for standardized Identification and Matching has not been addressed.
Survivorship is a pretty basic concept: the selection of the values to retain (or survive) from the different sources that are merged. Survivorship rules are often unique for each data integration process and typically determined by the developer. In the context of a data strategy, it’s important to identify the “systems of reference” because the identification of these systems provide clarity to developers and users to understand which data elements to retain when integrating data from multiple systems.
Standardize / Cleanse
The premise of data standardization and cleansing is to identify inaccurate data and correct and reformat the data to match the requirements (or the defined standards) for a specific business element. This is likely the single most beneficial process to improve the business value (and the usability) of data. The most common challenge to data standardization and cleansing is that it can be difficult to define the requirements. The other challenge is that most users aren’t aware that their company’s data isn’t standardized and cleansed as a matter of practice. Even though most companies have multiple tools to cleanup addresses, standardize descriptive details, and check the accuracy of values, the use of these tools is not common.
Wikipedia defines reference data as data that is used to classify or categorize other data. In the context of a data strategy, reference data is important because it ensures the consistency of data usage and meaning across different systems and business areas. Successful reference data means that details are consistently identified, represented, and formatted the same way across all aspects of the company (if the color of a widget is “RED”, then the value is represented as “RED” everywhere – not “R” in product information system, 0xFF0000 in inventory system, and 0xED2939 in product catalog). A Reference Data initiative is often aligned with a company’s data strategy initiative because of its impact to data sharing and reuse.
The idea of movement is to record the different systems that a data element touches as it travels (and is processed) after the data element is created. Movement tracking (or data lineage) is quite important when the validity and accuracy of a particular data value is questioned. And in the current era of heightened consumer data privacy and protection, the need for data lineage and tracking of consumer data within a company is becoming a requirement (and it’s the law in California and the European Union).
The dramatic increase in the quantity and diversity of data sources within most companies over the past few years has challenged even the most technology advanced organizations. It’s not uncommon to find one of the most visible areas of user frustration to be associated with accessing new (or additional) data sources. Much of this frustration occurs because of the challenge in sourcing, integrating, cleansing, and standardizing new data content to be shared with users. As is the case with all of the other components, the details are easy to understand, but complex to implement. A company’s data strategy has to evolve and change when data sharing becomes a production business requirement and users want data that is “ready to use”.
This blog is the 2nd in a series focused on reviewing the individual Components of a Data Strategy. This edition discusses the concept of data provisioning and the various details of making data sharable.
The definition of Provision is:
“Supplying data in a sharable form while respecting all rules and access guidelines”
One of the biggest frustrations that I have in the world of data is that few organizations have established data sharing as a responsibility. Even fewer have setup the data to be ready to share and use by others. It’s not uncommon for a database programmer or report developer to have to retrieve data from a dozen different systems to obtain the data they need. And, the data arrives in different formats and files that change regularly. This lack of consistency generates large ongoing maintenance costs and requires an inordinate amount of developer time to re-transform, prepare, fix data to be used (numerous studies have found that ongoing source data maintenance can take as much of 50% of the database developers time after the initial programming effort is completed).
Should a user have to know the details (or idiosyncrasies) of the application system that created the data to use the data? (That’s like expecting someone to understand the farming of tomatoes and manufacturing process of ketchup in order to be able to put ketchup on their hamburger). The idea of Provision is to establish the necessary rigor to simplify the sharing of data.
I’ve identified 5 of the most common facets of data sharing in the illustration above – there are others. As a reminder (from last week’s blog), each facet should be considered individually. And because your Data Strategy goals will focus on future aspirational goals as well as current needs, you’ll likely to want to review the different options for each facet. Each facet can target a small organization’s issues or expand to address a diverse enterprise’s needs.
This is the most obvious aspect of provisioning: structuring and formatting the data in a clear and understandable manner to the data consumer. All too often data is packaged at the convenience of the developer instead of the convenience of the user. So, instead of sharing data as a backup file generated by an application utility in a proprietary (or binary) format, the data should be formatted so every field is labeled and formatted (text, XML) for a non-technical user to access using easily available tools. The data should also be accompanied with metadata to simplify access.
This facet works with Packaging and addresses the details associated with the data container. Data can be shared via a file, a database table, an API, or one of several other methods. While sharing data in a programmer generated file is better than nothing, a more effective approach would be to deliver data in a well-known file format (such as Excel) or within a table contained in an easily accessible database (e.g. data lake or data warehouse).
Source data stewardship is critical in the sharing of data. In this context, a Source Data Steward is someone that is responsible for supporting and maintaining the shared data content (there several different types of data stewards). In some companies, there’s a data steward responsible for the data originating from an individual source system. Some companies (focused on sharing enterprise-level content) have positioned data stewards to support individual subject areas. Regardless of the model used, the data steward tracks and communicates source data changes, monitors and maintains the shared content, and addresses support needs. This particular role is vital if your organization is undertaking any sort of data self-service initiative.
This item addresses the issues that are common in the world of electronic data sharing: inconsistency, change, and error. Acceptance checking is a quality control process that reviews the data prior to distribution to confirm that it matches a set of criteria to ensure that all downstream users receive content as they expect. This item is likely the easiest of all details to implement given the power of existing data quality and data profiling tools. Unfortunately, it rarely receives attention because of most organization’s limited experience with data quality technology.
In order to succeed in any sort of data sharing initiative, whether in supporting other developers or an enterprise data self-service initiative, it’s important to identify the audience that will be supported. This is often the facet to consider first, and it’s valuable to align the audience with the timeframe of data sharing support. It’s fairly common to focus on delivering data sharing for developers support first followed by technical users and then the large audience of business users.
In the era of “data is a business asset” , data sharing isn’t a courtesy, it’s an obligation. Data sharing shouldn’t occur at the convenience of the data producer, it should be packaged and made available for the ease of the user.
Because the idea of building a data strategy is a fairly new concept in the world of business and information technology (IT), there’s a fair amount of discussion about the pieces and parts that comprise a Data Strategy. Most IT organizations have invested heavily in developing plans to address platforms, tools, and even storage. Those IT plans are critical in managing systems and capturing and retaining content generated by a company’s production applications. Unfortunately, those details don’t typically address all of the data activities that occur after an application has created and processed data from the initial business process. The reasons that folks take on the task of developing a Data Strategy is because of the challenges in finding, identifying, sharing, and using data. In any company, there are numerous roles and activities involved in delivering data to support business processing and analysis. A successful Data Strategy must support the breadth of activities necessary to ensure that data is “ready to use”.
There are five core components in a data strategy that work together as building blocks to address the various details necessary to comprehensively support the management and usage of data.
Identify The ability to identify data and understand its meaning regardless of its structure, origin, or location.
This concept is pretty obvious, but it’s likely one of the biggest obstacles in data usage and sharing. All too often, companies have multiple and different terms for specific business details (customer: account, client, patron; income: earnings, margin, profit). In order to analyze, report, or use data, people need to understand what it’s called and how to identify it. Another aspect of Identify is establishing the representation of the data’s value (Are the company’s geographic locations represented by name, number, or an abbreviation?) A successful Data Strategy would identify the gaps and needs in this area and identify the necessary activities and artifacts required to standardize data identification and representation.
Provision Enabling data to be packaged and made available while respecting all rules and access guidelines.
Data is often shared or made available to others at the convenience of the source system’s developers. The data is often accessible via database queries or as a series of files. There’s rarely any uniformity across systems or subject areas, and usage requires programming level skills to analyze and inventory the contents of the various tables or files. Unfortunately, the typical business person requiring data is unlikely to possess sophisticated programming and data manipulation skills. They don’t want raw data (that reflects source system formats and inaccuracies), they want data that is uniformly formatted and documented that is ready to be added to their analysis activities.
The idea of Provision is to package and provide data that is “ready to use”. A successful Data Strategy would identify the various data sharing needs and identify the necessary methods, practices, and tooling required to standardize data packaging and sharing.
Store Persisting data in a structure and location that supports access and processing across the enterprise.
Most IT organizations have solid plans for addressing this area of a Data Strategy. It’s fairly common for most companies to have a well-defined set of methods to determine the platform where online data is stored and processed, how data is archived for disaster recovery, and all of the other details such as protection, retention, and monitoring.
As the technology world has evolved, there are other facets of this area that require attention. The considerations include managing data distributed across multiple locations (the cloud, premise systems, and even multiple desktops), privacy and protection, and managing the proliferation of copies. With the emergence of new consumer privacy laws, it’s risky to store multiple copies of data, and it’s become necessary to track all existing copies of content. A successful Data Strategy ensures that any created data is always available for future access without requiring everyone to create their own copy.
Assemble Standardizing, combining, and moving data residing in multiple locations and providing a unified view.
It’s no secret that data integration is one of the more costly activities occurring within an IT organization; nearly 40% of the cost of new development is consumed by data integration activities. And Assemble isn’t limited to integration, it also includes correcting, standardizing, and formatting the content to make it “ready to use”.
With the growth of analytics and desktop decisioning making, the need to continually analyze and include new data sets into the decision-making process has exploded. Processing (or preparing or wrangling) data is no longer confined to the domain of the IT organization, it has become an end user activity. A successful Data Strategy had to ensure that all users can be self-sufficient in their abilities to process data.
Govern Establishing and communicating information rules, policies, and mechanisms to ensure effective data usage.
While most organizations are quick to identify their data as a core business asset, few have put the necessary rigor in place to effectively manage data. Data Governance is about establishing rules, policies, and decision mechanisms to allow individuals to share and use data in a manner that respects the various (legal and usage) guidelines associated with that data. The inevitable challenge with Data Governance is adoption by the entire data supply chain – from application developers to report developers to end users. Data Governance isn’t a user-oriented concept, it’s a data-oriented concept. A successful Data Strategy identifies the rigor necessary to ensure a core business asset is managed and used correctly.
The 5 Components of a Data Strategy is a framework to ensure that all of a company’s data usage details are captured and organized and that nothing is unknowingly overlooked. A successful Data Strategy isn’t about identifying every potential activity across the 5 different components. It’s about making sure that all of the identified solutions to the problems in accessing, sharing, and using data are reviewed and addressed in a thorough manner.
A simple definition of Data Strategy is
“ A plan designed to improve all of the ways you acquire, store, manage, share, and use data”
Over the years, most companies have spent a fortune on their data. They have a bunch of folks that comprise their “center of expertise”, they’ve invested lots of money in various data management tools (ETL-extract/transformation/load, metadata, data catalogs, data quality, etc.), and they’ve spent bazillions on storage and server systems to retain their terabytes or petabytes of data. And what you often find is a lot of disparate (or independent) projects building specific deliverables for individual groups of users. What you rarely find is a plan that addresses all of the disparate user needs that to support their ongoing access, sharing, use of data.
While most companies have solid platform strategies, storage strategies, tool strategies, and even development strategies, few companies have a data strategy. The company has technology standards to ensure that every project uses a specific brand of server, a specific set of application development tools, a well-defined development method, and specific deliverables (requirements, code, test plan, etc.) You rarely find data standards: naming conventions and value standards, data hygiene and correction, source documentation and attribute definitions, or even data sharing and packaging conventions. The benefit of a Data Strategy is that data development becomes reusable, repeatable, more reliable, faster. Without a data strategy, the data activities within every project are always invented from scratch. Developers continually search and analyze data sources, create new transformation and cleansing code, and retest the same data, again, and again, and again.
The value of a Data Strategy is that it provides a roadmap of tasks and activities to make data easier to access, share, and use. A Data Strategy identifies the problems and challenges across multiple projects, multiple teams, and multiple business functions. A Data Strategy identifies the different data needs across different projects, teams, and business functions. A Data Strategy identifies the various activities and tasks that will deliver artifacts and methods that will benefit multiple projects, teams and business functions. A Data Strategy delivers a plan and roadmap of deliverables that ensures that data across different projects, multiple teams, and business functions are reusable, repeatable, more reliable, and delivered faster.
A Data Strategy is a common thread across both disparate and related company projects to ensure that data is managed like a business asset, not an application byproduct. It ensures that data is usable and reusable across a company. A Data Strategy is a plan and road map for ensuring that data is simple to acquire, store, manage, share, and use.
In order to prepare for the cooking gauntlet that often occurs with the end of year holiday season, I decided to purchase a new rotisserie oven. The folks at Acme Rotisserie include a large amount of documentation with their rotisserie. I reviewed the entire pile and was a bit surprised by the warranty registration card. The initial few questions made sense: serial number, place of purchase, date of purchase, my home address. The other questions struck me as a bit too inquisitive: number of household occupants, household income, own/rent my residence, marital status, and education level. Obviously, this card was a Trojan horse of sorts; provide registration details –and all kinds of other personal information. They wanted me to give away my personal information so they could analyze it, sell it, and make money off of it.
Companies collecting and analyzing consumer data isn’t anything new –it’s been going on for decades. In fact, there are laws in place to protect consumer’s data in quite a few industries (healthcare, telecommunications, and financial services). Most of the laws focus on protecting the information that companies collect based on their relationship with you. It’s not the just details that you provide to them directly; it’s the information that they gather about how you behave and what you purchase. Most folks believe behavioral information is more valuable than the personal descriptive information you provide. The reason is simple: you can offer creative (and highly inaccurate) details about your income, your education level, and the car you drive. You can’t really lie about your behavior.
I’m a big fan of sharing my information if it can save me time, save me money, or generate some sort of benefit. I’m willing to share my waist size, shirt size, and color preferences with my personal shopper because I know they’ll contact me when suits or other clothing that I like is available at a good price. I’m fine with a grocer tracking my purchases because they’ll offer me personalized coupons for those products. I’m not okay with the grocer selling that information to my health insurer. Providing my information to a company to enhance our relationship is fine; providing my information to a company so they can share, sell, or otherwise unilaterally benefit from it is not fine. My data is proprietary and my intellectual property.
Clearly companies view consumer data to be a highly valuable asset. Unfortunately, we’ve created a situation where there’s little or no cost to retain, use, or abuse that information. As abuse and problems have occurred within certain industries (financial services, healthcare, and others), we’ve created legislation to force companies to responsibly invest in the management and protection of that information. They have to contact you to let you know they have your information and allow you to update communications and marketing options. It’s too bad that every company with your personal information isn’t required to behave in the same way. If data is so valuable that a company retains it, requiring some level of maintenance (and responsibility) shouldn’t be a big deal.
It’s really too bad that companies with copies of my personal information aren’t required to contact me to update and confirm the accuracy of all of my personal details. That would ensure that all of the specialized big data analytics that are being used to improve my purchase experiences were accurate. If I knew who had my data, I could make sure that my preferences were up to date and that the data was actually accurate.
It’s unfortunate that Acme Rotisserie isn’t required to contact me to confirm that I have 14 children, an advanced degree in swimming pool construction, and that I have Red Ferrari in my garage. It will certainly be interesting to see the personalized offers I receive for the upcoming Christmas shopping season.
One of the challenges in delivering successful data-centric projects (e.g. analytics, BI, or reporting) is realizing that the definition of project success differs from traditional IT application projects. Success for a traditional application (or operational) project is often described in terms of transaction volumes, functional capabilities, processing conformance, and response time; data project success is often described in terms of business process analysis, decision enablement, or business situation measurement. To a business user, the success of a data-centric project is simple: data usability.
It seems that most folks respond to data usability issues by gravitating towards a discussion about data accuracy or data quality; I actually think the more appropriate discussion is data knowledge. I don’t think anyone would argue that to make data-enabled decisions, you need to have knowledge about the underlying data. The challenge is understanding what level of knowledge is necessary. If you ask a BI or Data Warehouse person, their answer almost always includes metadata, data lineage, and a data dictionary. If you ask a data mining person, they often just want specific attributes and their descriptions — they don’t care about anything else. All of these folks have different views of data usability and varying levels (and needs) for data knowledge.
One way to improve data usability is to target and differentiate the user audience based on their data knowledge needs. There are certainly lots of different approaches to categorizing users; in fact, every analyst firm and vendor has their own model to describe different audience segments. One of the problems with these types of models is that they tend to focus heavily on the tools or analytical methods (canned reports, drill down, etc.) and ignore the details of data content and complexity. The knowledge required to manipulate a single subject area (revenue or customer or usage) is significantly less than the skills required to manipulate data across 3 subject areas (revenue, customer, and usage). And what exacerbates data knowledge growth is the inevitable plethora of value gaps, inaccuracies, and inconsistencies associated with the data. Data knowledge isn’t just limited to understanding the data; it includes understanding how to work around all of the imperfections.
Here’s a model that categories and describes business users based on their views of data usability and their data knowledge needs
Level 1: “Can you explain these numbers to me?”
This person is the casual data user. They have access to a zillion reports that have been identified by their predecessors and they focus their effort on acting on the numbers they get. They’re not a data analyst – their focus is to understand the meaning of the details so they can do their job. They assume that the data has been checked, rechecked, and vetted by lots of folks in advance of their receiving the content. They believe the numbers and they act on what they see.
Level 2: “Give me the details”
This person has been using canned reports, understands all the basic details, and has graduated to using data to answer new questions that weren’t identified by their predecessors. They need detailed data and they want to reorganize the details to suit their specific needs (“I don’t want weekly revenue breakdowns – I want to compare weekday revenue to weekend revenue”). They realize the data is imperfect (and in most instances, they’ll live with it). They want the detail.
Level 3: “I don’t believe the data — please fix it”
These folks know their area of the business inside/out and they know the data. They scour and review the details to diagnose the business problems they’re analyzing. And when they find a data mistake or inaccuracy, they aren’t shy about raising their hand. Whether they’re a data analyst that uses SQL or a statistician with their favorite advanced analytics algorithms, they focus on identifying business anomalies. These folks are the power users that are incredibly valuable and often the most difficult for IT to please.
Level 4: “Give me more data”
This is subject area graduation. At this point, the user has become self-sufficient with their data and needs more content to address a new or more complex set of business analysis needs. Asking for more data – whether a new source or more detail – indicates that the person has exhausted their options in using the data they have available. When someone has the capacity to learn a new subject area or take on more detailed content, they’re illustrating a higher level of data knowledge.
One thing to consider about the above model is that a user will have varying data knowledge based on the individual subject area. A marketing person may be completely self-sufficient on revenue data but be a newbie with usage details. A customer support person may be an expert on customer data but only have limited knowledge of product data. You wouldn’t expect many folks (outside of IT) to be experts on all of the existing data subject areas. Their knowledge is going to reflect the breadth of their job responsibilities.
As someone grows and evolves in business expertise and influence, it’s only natural that their business information needs would grow and evolve too. In order to address data usability (and project success), maybe it makes sense to reconsider the various user audience categories and how they are defined. Growing data knowledge isn’t about making everyone data gurus; it’s about enabling staff members to become self-sufficient in their use of corporate data to do their jobs.
Photo “Ladder of Knowledge” courtesy of degreezero2000 via Flickr (Creative Commons license).
Companies spend a small fortune continually investing and reinvesting in making their business analysts self-sufficient with the latest and greatest analytical tools. Most companies have multiple project teams focused on delivering tools to simplify and improve business decision making. There are likely several standard tools deployed to support the various data analysis functions required across the enterprise: canned/batch reports, desktop ad hoc data analysis, and advanced analytics. There’s never a shortage of new and improved tools that guarantee simplified data exploration, quick response time, and greater data visualization options, Projects inevitably include the creation of dozens of prebuilt screens along with a training workshop to ensure that the users understand all of the new whiz bang features associated with the latest analytic tool incarnation. Unfortunately, the biggest challenge within any project isn’t getting users to master the various analytical functions; it’s ensuring the users understand the underlying data they’re analyzing.
If you take a look at the most prevalent issue with the adoption of a new business analysis tool is the users’ knowledge of the underlying data. This issue becomes visible with a number of common problems: the misuse of report data, the misunderstanding of business terminology, and/or the exaggeration of inaccurate data. Once the credibility or usability of the data comes under scrutiny, the project typically goes into “red alert” and requires immediate attention. If ignored, the business tool quickly becomes shelfware because no one is willing to take a chance on making business decisions based on risky information.
All too often the focus on end user training is tool training, not data training. What typically happens is that an analyst is introduced to the company’s standard analytics tool through a “drink from a fire hose” training workshop. All of the examples use generic sales or HR data to illustrate the tool’s strengths in folding, spindling, and manipulating the data. And this is where the problem begins: the vendor’s workshop data is perfect. There’s no missing or inaccurate data and all of the data is clearly labeled and defined; classes run smoothly, but it just isn’t reality Somehow the person with no hands-on data experience is supposed to figure out how to use their own (imperfect) data. It’s like someone taking their first ski lesson on a cleanly groomed beginner hill and then taking them up to the top of an a black diamond (advanced) run with step hills and moguls. The person works hard but isn’t equipped to deal with the challenges of the real world. So, they give up on the tool and tell others that the solution isn’t usable.
All of the advanced tools and manipulation capabilities don’t do any good if the users don’t understand the data. There are lots of approaches to educating users on data. Some prefer to take a bottom-up approach (reviewing individual table and column names, meanings, and values) while others want to take a top-down approach (reviewing subject area details, the associated reports, and then getting into the data details). There are certainly benefits of one approach over the other (depending on your audience); however, it’s important not to lose sight of the ultimate goal: giving the users the fundamental data knowledge they need to make decisions. The fundamentals that most users need to understand their data include a review of
- the business subject area associated with their dat
- business terms, definitions, and their associated data attributes
- data values and their representations
- business rules and calculations associated with the individual values
- the data’s origin (a summary of the business processes and source system)
The above details may seem a bit overwhelming if you consider that most companies have mature reporting environments and multi-terabyte data warehouses. However, we’re not talking about training someone to be an expert on 1000 data attributes contained within your data warehouse; we’re talking about ensuring someone’s ability to use an initial set of reports or a new tool without requiring 1-on-1 training. It’s important to realize that the folks with the greatest need for support and data knowledge are the newbies, not the experienced folks.
There are lots of options for imparting data knowledge to business users: a hands-on data workshop, a set of screen videos showing data usage examples, or a simple set of web pages containing definitions, textual descriptions, and screen shots. Don’t get wrapped up in the complexities of creating the perfect solution – keep it simple. I worked with a client that deployed their information using a set of pages constructed with PowerPoint that folks could reference in a the company’s intranet. If your users have nothing – don’t’ worry about the perfect solution – give them something to start with that’s easy to use.
Remember that the goal is to build users’ data knowledge that is sufficient to get them to adopt and use the company’s analysis tools. We’re not attempting to convert everyone into data scientists; we just want them to use the tools without requiring 1-on-1 training to explain every report or data element.
Photo courtesy of NASA. Nasa Ames Research Center engineer H Julian “Harvey” Allen illustrating data knowledge (relating to capsule design for the Mercury program)