The Flaw of the Data Inventory

Grecian Urn 2

Back when I was applying to college, I’d read over college catalogs. Inevitably, each university would mention the number of books it had in its library. When I finally went to college, I realized that this metric was fairly meaningless. A dozen volumes on Grecian pottery did me no good when I was in search of a book on polymers for my mechanical engineering class.

Clients will often ask us to scope a “data inventory” project, inevitably focused on identifying and describing all the data elements contained across their different application systems. Recently a new CIO asked us to head up a “tiger team” to inventory his company’s data. He was surprised at the quantity of information needs that had been sent his way. As expected, he inquired about systems of record and data dictionaries. As you can imagine, he received multiple and conflicting answers which only exacerbated his confusion.

As a point of reference, well-known ERP systems can have in excess of 50,000 discrete data elements in their databases (never mind that some aren’t in English). As I’ve written in the past, many of these data elements have no use outside of the application itself.

Having terabyte upon terabyte of information is equally irrelevant if that data is unrelated to current business issues. The problem with a data inventory activity is that identifying and counting data elements in different systems and applications won’t necessarily solve any problems. Why? Because data across applications and packages is inconsistent: there are different names, definitions, and values, and there is no practical means of determining which data they actually have in common. This is like going to the hardware store and looking for a specific screw, but all the different screws are in one big barrel—you end up having to pick through each screw, one at time. When you find the screw, you just throw all the other screws back into the barrel.

The point of a data inventory isn’t to pick through data because it exists, but to inventory the data people actually need. If you’re going to undertake a data inventory, your output should be structured so that the next person doesn’t have to repeat your work.  Identify the data that is moving across various systems, as this indicates key information that’s being shared. Categorize this data by subject area. You’ll inevitably find that there are inconsistent versions of the data, enabling you to identify data disparities. You can then begin to develop a catalog of key corporate data that will form the basis of your data dictionary.

Inventorying the data that moves between systems accomplishes two things: it identifies the most valuable data elements in use, and it will also help identify data that’s not high-value, as it’s not being shared or used. This approach also provides a way to tackle initial data quality efforts by identifying the most “active” data used by the business. It ultimately helps the data management team understand where to focus its efforts, and prioritize accordingly.

So next time someone suggests a data inventory without context or objectives, consider sending them to college to study Grecian urns.

Advertisements

Tags: , , , , , , , ,

About Evan Levy

Evan Levy is Vice President of Business Consulting at SAS. In addition to his day-to-day job responsibilities, Evan speaks, writes, and blogs about the challenges of managing and using data to support business decision making.

One response to “The Flaw of the Data Inventory”

  1. Jim Harris says :

    Oh, still unidentified inventory of data assets!
    You are the foster-child of silence and slow time.
    Historians, who cannot thus express current business issues,
    Offer instead, a flowery tale flowing more sweetly than a rhyme.
    What leafy pages of dead trees will haunt the inventory,
    Of structured or unstructured data sources, or of both,
    Locked in metadata repositories or drawers of old file cabinets?
    What inconsistent and unknowing knowledge are these?
    What mad pursuit of wasted wisdom we struggle to escape,
    What countless discrete data elements? What wild SQL query?
    All data plays a melody that is sweet, but those truly in use
    Are sweeter; therefore, data inventory, catalog data assets,
    Not for the sake of it, but for something more endeared,
    Catalog what the business truly uses and needs, but no more.
    When it is complete, with repeatable process and no waste,
    Your data inventory shall remain, even in midst of other woes,
    A true friend to one and all, and to whom you shall say:
    “Business insight is truth, truth business insight,” – that is all
    You know of your data assets, and all you need to know.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: