You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality
When it comes to bad data, a lot of the problem stems from companies letting their developers off the hook. That’s right. When it comes to delivering, maintaining, and justifying their code, developers are given a lot of rope. When projects start, everyone nods their head in agreement when data quality comes up. But then there’s scope creep and sizing mistakes, and projects run long.
People start looking for things to remove. And writing error detection and correction code is not only complicated, it’s not sexy. It’s like writing documentation; no one wants to do it because it’s detailed and time consuming. This is the finish work: it’s the fancy veneer, the polished trim, and the paint color. Software vendors get this. If a data entry error shows up in a demo or a software review, it could make or break that product’s reputation. When was the last time any Windows product let you save a file with an invalid name? It doesn’t happen. The last thing a Word user needs is to sweat blood over a document and then never be able to open it again because it was named with an untypeable character.
Error detection and correction code are core aspects of development and require rigorous review. Accurate data isn’t just a business requirement—it’s common sense. Users shouldn’t have to explain to developers why inaccurate values aren’t allowed. Do you think that the business users at Amazon.com had to tell their developers that “The Moon” was an invalid delivery address? But all too often developers don’t think they have any responsibility for data entry errors.
When a system creates data, and when that data leaves that system, the data should be checked and corrected. Bad data should be viewed as a hazardous material that should not be transported. The moment you generate data, you have the implicit responsibility to establish its accuracy and integrity. Distributing good data to your competitors is unacceptable; distributing bad data to your team is irresponsible. And when bad data is ignored, it’s negligence.
While everyone—my staff members, included—wants to talk about data governance, policy-making, and executive councils, it all starts with bad data being input into systems in the first place. So, what if we fixed it at the beginning?
Photo by Random J via Flickr (Creative Commons License)