BI Reports, Data Quality, and the Dreaded Design Review
One of many discussions I heard over Thanksgiving turkey was, “How could the government have let the financial crisis happen?” To which the most frequent response was that regulators were asleep at the wheel. True or not, one could legitimately ask why we have problems with our business intelligence reports. The data is bad and the report is meaningless—who’s asleep at the wheel?
Everyone’s talking about the single version of the truth, but how often are our reports reviewed for accuracy? Several of our financial services clients demand that their BI reports are audited back to the source systems and that numbers are reconciled.
Unfortunately, this isn’t common practice across industries. When we work with new clients we ask about data reconciliation, but most of our new clients don’t have the methods or processes in place. It makes me wonder how engaged business users are in establishing audit and reconciliation rules for their BI capabilities.
No, data perfection isn’t practical. But we should be able to guard against lost data and protect our users from formulas and equations that change. All too often these issues are thrown into the “post development” bucket or relegated to User Acceptance. By then reports aren’t always corrected and data isn’t always fixed.
A robust development process should ensure that data accuracy should be established and measured throughout development. This means that design reviews are necessary before, during, and after development. Design reviews ensure that the data is continually being processed accurately. Many believe that it’s ten or more times more expensive to fix broken code (or data) after development than it is during development. And, as we’ve all seen, often the data doesn’t get fixed at all.
When you’re building a report or delivering data, ask two questions: 1) whether the numbers reflect business expectations, and 2) if they reconcile back to their system of origin. Design review processes should be instituted (or, in many cases, re-instituted) to ensure functional accuracy long before the user every sees the data on her desktop.
You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality
When it comes to bad data, a lot of the problem stems from companies letting their developers off the hook. That’s right. When it comes to delivering, maintaining, and justifying their code, developers are given a lot of rope. When projects start, everyone nods their head in agreement when data quality comes up. But then there’s scope creep and sizing mistakes, and projects run long.
People start looking for things to remove. And writing error detection and correction code is not only complicated, it’s not sexy. It’s like writing documentation; no one wants to do it because it’s detailed and time consuming. This is the finish work: it’s the fancy veneer, the polished trim, and the paint color. Software vendors get this. If a data entry error shows up in a demo or a software review, it could make or break that product’s reputation. When was the last time any Windows product let you save a file with an invalid name? It doesn’t happen. The last thing a Word user needs is to sweat blood over a document and then never be able to open it again because it was named with an untypeable character.
Error detection and correction code are core aspects of development and require rigorous review. Accurate data isn’t just a business requirement—it’s common sense. Users shouldn’t have to explain to developers why inaccurate values aren’t allowed. Do you think that the business users at Amazon.com had to tell their developers that “The Moon” was an invalid delivery address? But all too often developers don’t think they have any responsibility for data entry errors.
When a system creates data, and when that data leaves that system, the data should be checked and corrected. Bad data should be viewed as a hazardous material that should not be transported. The moment you generate data, you have the implicit responsibility to establish its accuracy and integrity. Distributing good data to your competitors is unacceptable; distributing bad data to your team is irresponsible. And when bad data is ignored, it’s negligence.
While everyone—my staff members, included—wants to talk about data governance, policy-making, and executive councils, it all starts with bad data being input into systems in the first place. So, what if we fixed it at the beginning?
Photo by Random J via Flickr (Creative Commons License)