Perfect Data and Other Data Quality Myths
A recent client experience reminds me what I’ve always said about data quality: it isn’t the same as data perfection. After all, how could it be? A lot of people think that correcting data is a post-facto activity based on opinion and anecdotal problems. But it should be an entrenched process.
One drop of motor oil can pollute 25 quarts of drinking water. But it’s not the same with data. On the other hand, an average of less than 75 insect fragments per 50 grams of wheat flour is acceptable. (Jill says this is “apocryphal,” but you get my point.)
People forget that the definition of data quality is data that’s fit for purpose. It conforms to requirements. You only have to look back at the work of Philip Crosby and W. Edwards Demming to understand that quality is about conformance to requirements. We need to understand the variance between the data as it exists and its acceptability, not its perfection.
The reason data quality gets so much attention is when bad data gets in the way of getting the job done. If I want to send an e-mail to 10,000 customers and one customer’s zip code is unknown, it doesn’t prevent me from contacting the other 9999 customers. That can amount to what in any CMO’s estimation is a very successful marketing campaign. The question should be: What data helps us get the job done?
Our client is a regional bank that has retained Baseline to work with its call center staff. Customer service reps (CSRs) have been frustrated that they get multiple records for the same customer. They had to jump through hoops to find the right data, often while the customer waited on the phone, or on-line. The problem wasn’t that the data was “bad”—it was that the CSRs could only use the customer’s phone number to look up the record. If the phone number was incorrect, the CSR can’t do her job. And as a result, her compensation suffers. So data quality is very important to her. And to the bank at large.
Users are all too accustomed to complaining about data. The goal of data quality should be continuous improvement, ensuring a process is available to fix data when it’s broken. If you want to address data quality, focus energy on the repair process. As long as your business is changing—and I hope it is—its data will continue to change. Data requirements, measurements, and the reference points for acceptability will keep changing too. If you’re involved in a data quality program, think of it as job security.
MDM: Subject-Area Data Integration
I frequently describe MDM as subject area data integration. The whole point of mastering and managing data is to simplify data sharing, since confusion only occurs when you have two or more instances of data and it doesn’t match. It’s important to realize that mastering data isn’t really necessary if you only have a single system that contains one copy of data. After all, how much confusion or misunderstanding can occur when there’s only one copy of data? The challenge in making data more usable and easy to understand exists because most companies have multiple application systems each with their own copy of data (and their own “version of truth”). MDM’s promise is to deliver a single view of subject area data. In our book, Customer Data Integration: Reaching a Single Version of the Truth (John Wiley & Sons, 2006), Jill Dyché and I defined MDM as:
“The set of disciplines and methods to ensure the currency, meaning, and quality of a company’s reference data that is shared across various systems and organizations.”
As companies have grown, so to have the number of systems that require access to each other’s data. This is why data integration has become one of the largest custom development activities undertaken within an IT organization. It’s rare that all systems (and their developers) integrate data the same way. While there may be rigor within an individual application or system, it’s highly unlikely that all systems manipulate an individual subject area in a consistent fashion. This lack of integrity and consistency becomes visible when information on two different systems conflict. MDM isn’t a silver bullet to address this problem. It is a method to address data problems one subject area at a time.
The reason for establishing a boundary around subject area is because the complexity, rules, and usage of data within most organization tend to differ by subject area. Examples of subject areas include customer, product, and supplier. There can be literally dozens if not hundreds subject areas within any given company.
Figure 1: Different Data Subject Areas
Do you need to master every subject area? Probably not. MDM projects focus on subject areas that suffer the most from inaccuracies, mistakes, and misunderstandings, for instance, customers with inaccurate identification numbers, products missing descriptive information, or an employee with an inaccurate start date. The idea behind master data management is to establish rules, guidelines, and rigor for subject areas data.
The rules associated with identifying a customer are typically well defined within a company. The rules associated with adding a new product to the sales catalog are also well defined. The thing to keep in mind is that the rules associated with product will have nothing to do with customers. Additionally, most companies have rules that limit what customer data can be modified. They also have rules that restrict how product information can be manipulated.. The idea behind MDM is to manage these rules and methods in a manner where all application systems manipulate reference data in a consistent way.
Implementing MDM isn’t just about building and deploying a server that contains the “master list” of reference data; that’s the easy part. MDM’s real challenge is integrating the functionality into the multitude of application systems that exist within a company. The idea is that when a new customer is added, all systems are aware of the change and have equal access to that data.
For instance, one of the most universal challenges in business today is managing a customer’s marketing preferences. When a customer asks to opt out of all marketing communications, it’s important that all systems are aware of this choice. Problems typically occur when a particular data element can be modified from multiple different locations (e.g., a web page, an 800 number, or even the US Postal Service). MDM provides the solution for ensuring that the master data is managed correctly and that all systems become aware of the change (and the new data) in a manner that supports the businesses needs.