You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality

Video Game Error

When it comes to bad data, a lot of the problem stems from companies letting their developers off the hook. That’s right. When it comes to delivering, maintaining, and justifying their code, developers are given a lot of rope. When projects start, everyone nods their head in agreement when data quality comes up. But then there’s scope creep and sizing mistakes, and projects run long.

People start looking for things to remove. And writing error detection and correction code is not only complicated, it’s not sexy. It’s like writing documentation; no one wants to do it because it’s detailed and time consuming. This is the finish work: it’s the fancy veneer, the polished trim, and the paint color. Software vendors get this. If a data entry error shows up in a demo or a software review, it could make or break that product’s reputation. When was the last time any Windows product let you save a file with an invalid name? It doesn’t happen. The last thing a Word user needs is to sweat blood over a document and then never be able to open it again because it was named with an untypeable character.

Error detection and correction code are core aspects of development and require rigorous review.  Accurate data isn’t just a business requirement—it’s common sense. Users shouldn’t have to explain to developers why inaccurate values aren’t allowed. Do you think that the business users at had to tell their developers that “The Moon” was an invalid delivery address?  But all too often developers don’t think they have any responsibility for data entry errors.  

When a system creates data, and when that data leaves that system, the data should be checked and corrected.  Bad data should be viewed as a hazardous material that should not be transported. The moment you generate data, you have the implicit responsibility to establish its accuracy and integrity.  Distributing good data to your competitors is unacceptable;  distributing bad data to your team is irresponsible. And when bad data is ignored, it’s negligence.

While everyone—my staff members, included—wants to talk about data governance, policy-making, and executive councils, it all starts with bad data being input into systems in the first place.  So, what if we fixed it at the beginning?

Photo by Random J via Flickr (Creative Commons License)

Tags: , , , , ,

About Evan Levy

Evan Levy is management consultant and partner at IntegralData. In addition to his day-to-day job responsibilities, Evan speaks, writes, and blogs about the challenges of managing and using data to support business decision making.

4 responses to “You Build it, You Break It, You Fix It: Why Applications Must Be Responsible for Data Quality”

  1. Jim Harris says :

    “…writing error detection and correction code… it’s not sexy.”
    Where’s Justin Timberlake when you need him?
    I’m bringing sexy back (yeah)
    Them other coders don’t know how to act (yeah)
    I’m thinking data’s special, what’s your quality lack (yeah)
    So grant me access and I’ll pick up the slack (yeah)
    That’s right – Data Quality, it’s the new Sexy.
    Spread the word…

  2. Phil Simon says :

    In his own data-oriented way, Jim is the epitome of sexy.

  3. William Sharp says :

    I think this is a very important leap for data quality. I agree that it does need to be pushed to the point of origin. The sell is that it is cheaper to do it from project onset and in validation methods rather than waiting a few years, making mistakes based off of poor quality data and starting a new more specialized and expensive project that still won’t solve the problem at the origin.
    When data quality is pushed to the origin, we’ll know we’ve arrived on the main stage!

  4. Julian Schwarzenbach says :

    Good post.
    This matches my long held view that data quality problems are actually people problems.
    In this case, you expand on this premise by correctly identifying two layers of human error:-
    1. The programmers not preventing obvious human error problems; and
    2. Users entering incorrect data either through ignorance or carelessness.
    This leads on to thoughts about how to avoid these problems – if the cost of data quality errors were clearly and unambiguously expressed (not sure how this could be done – suggestions?) then these costs should be fed into the risk assessment of any project/programme changes. For example, removing safeguards will increase the likelihood of errors, but not necessarily the consequence.
    Risk is a product of likelihood and consequence, which, if expressed as an annualised financial impact will give a far clearer view on the risk of descoping controls. As risk is then expressed in financial terms, it becomes harder to ignore.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: