There’s nothing more frustrating than not being able to rely upon a business partner. There’s lots of business books about information technology that espouses the importance of Business/IT alignment and the importance of establishing business users as IT stakeholders. The whole idea of delivering business value with data and analytics is to provide business users with tools and data that can support business decision making. It’s incredibly hard to deliver business value when half of the partnership isn’t stepping up to their responsibilities.
There’s never a shortage of rationale as to why requirements haven’t been collected or recorded. In order for a relationship to be successful, both parties have to participate and cooperate. Gathering and recording requirements isn’t possible if the technologist doesn’t meet with the users to discuss their needs, pains, and priorities. Conversely, the requirements process won’t succeed if the users won’t participate. My last blog reviewed the excuses that technologists offered for explaining the lack of documented requirements; this week’s blog focuses on remarks I’ve heard from business stakeholders.
- “I’m too busy. I don’t have time to talk to developers”
- “I meet with IT every month, they should know my requirements”
- “IT isn’t asking me for requirements, they want me to approve SQL”
- “We sent an email with a list of questions. What else do they need?”
- “They have copies of reports we create. That should be enough.”
- “The IT staff has worked here longer than I have. There’s nothing I can tell them that they don’t already know”
- “I’ve discussed my reporting needs in 3 separate meetings; I seem to be educating someone else with each successive discussion”
- “I seem to answer a lot of questions. I don’t ever see anyone writing anything down”
- “I’ll meet with them again when they deliver the requirements I identified in our last discussion.
- “I’m not going to sign off on the requirements because my business priorities might change – and I’ll need to change the requirements.
Requirements gathering is really a beginning stage for negotiating a contract for the creation and delivery of new software. The contract is closed (or agreed to) when the business stakeholders agree to (or sign-off on) the requirements document. While many believe that requirements are an IT-only artifact, they’re really a tool to establish responsibilities of both parties in the relationship.
A requirements document defines the data, functions, and capabilities that the technologist needs to build to deliver business value. The requirements document also establishes the “product” that will be deployed and used by the business stakeholders to support their business decision making activities. The requirements process holds both parties accountable: technologists to build and business stakeholders to use. When two organizations can’t work together to develop requirements, it’s often a reflection of a bigger problem.
It’s not fair for business stakeholders to expect development teams to build commercial grade software if there’s no participation in the requirements process. By the same token, it’s not right for technologists to build software without business stakeholder participation. If one stakeholder doesn’t want to participate in the requirements process, they shouldn’t be allowed to offer an opinion about the resulting deliverable. If multiple stakeholders don’t want to participate in a requirements activity, the development process should be cancelled. Lack of business stakeholder participation means they have other priorities; the technologists should take a hint and work on their other priorities.
I received a funny email the other day about excuses that school children use to explain why they haven’t done their homework. The examples were pretty creative: “my mother took it to be framed”, “I got soap in my eyes and was blinded all night”, and (an oldie and a goody) –“my dog ate my homework”. It’s a shame that such a creative approach yielded such a high rate of failure. Most of us learn at an early age that you can’t talk your way out of failure; success requires that you do the work. You’d also think that as people got older and more evolved, they’d realize that there’s very few shortcuts in life.
I’m frequently asked to conduct best practice reviews of business intelligence and data warehouse (BI/DW) projects. These activities usually come about because either users or IT management is concerned with development productivity or delivery quality. The review activity is pretty straight forward; interviews are scheduled and artifacts are analyzed to review the various phases, from requirements through construction to deployment. It’s always interesting to look at how different organizations handle architecture, code design, development, and testing. One of the keys to conducting a review effort is to focus on the actual results (or artifacts) that are generated during each stage. It’s foolish to discuss someone’s development method or style prior to reviewing the completeness of the artifacts. It’s not necessary to challenge someone approach if their artifacts reflect the details required for the other phases.
And one of the most common problems that I’ve seen with BI/DW development is the lack of documented requirements. Zip – zero –zilch – nothing. While discussions about requirements gathering, interview styles, and even document details occur occasionally, it’s the lack of any documented requirements that’s the norm. I can’t imagine how any company allows development to begin without ensuring that requirements are documented and approved by the stakeholders. Believe it or not, it happens a lot.
So, as a tribute to the creative school children of yesterday and today, I thought I would devote this blog to some of the most creative excuses I’ve heard from development teams to justify their beginning work without having requirements documentation.
- “The project’s schedule was published. We have to deliver something with or without requirements”
- “We use the agile methodology, it’s doesn’t require written requirements”
- “The users don’t know what they want.”
- “The users are always too busy to meet with us”
- “My bonus is based on the number of new reports I create. We don’t measure our code against requirements”
- “We know what the users want, we just haven’t written it down”
- “We’ll document the requirements once our code is complete and testing finished”
- “We can spend our time writing requirements, or we can spend our time coding”
- “It’s not our responsibility to document requirements; the users need to handle that”
- “I’ve been told not to communicate with the business users”
Many of the above items clearly reflect a broken set of management or communication methods. Expecting a development team to adhere to a project schedule when they don’t have requirements is ridiculous. Forcing a team to commit to deliverables without requirements challenges conventional development methods and financial common sense. It also reflects leadership that focuses on schedules, utilization and not business value.
A development team that is asked to build software without a set of requirements is being set up to fail. I’m always astonished that anyone would think they can argue and justify that the lack of documented requirements is acceptable. I guess there are still some folks that believe they can talk their way out of failure.
In one of my previous blogs, I wrote about Data Virtualization technology — one of the more interesting pieces of middleware technology that can simplify data management. While most of the commercial products in this space share a common set of features and functions, I thought I’d devote this blog to discussing the more advanced features. There are quite a few competing products; the real challenge in differentiating the products is to understand their more advanced features.
The attraction of data virtualization is that it simplifies data access. Most IT shops have one of everything – and this includes several different brands of commercial DBMSs, a few open source databases, a slew of BI/reporting tools, and the inevitable list of emerging and specialized tools and technologies (Hadoop, Dremel, Casandra, etc.) Supporting all of the client-to-server-to-repository interfaces (and the associated configurations) is both complex and time consuming. This is why the advanced capabilities of Data Virtualization have become so valuable to the IT world.
The following details aren’t arranged in any particular order. I’ve identified the ones that I’ve found to be the most valuable (and interesting). Let me also acknowledge not every DV product supports all of these features.
Intelligent data caching. Repository-to-DV Server data movement is the biggest obstacle in query response time. Most DV products are able to support static caching to reduce repetitive data movement (data is copied and persisted in the DV Server). Unfortunately, this approach has limited success when there are ad hoc users accessing dozens of sources and thousands of tables. The more effective solution is for the DV Server to monitor all queries and dynamically cache data based on user access, query load, and table (and data) access frequency.
Query optimization (w/multi-platform execution). While all DV products claim some amount of query optimization, it’s important to know the details. There are lots of tricks and techniques; however, look for optimization that understands source data volumes, data distribution, data movement latency, and is able to process data on any source platform.
Support for multiple client Interfaces. Since most companies have multiple database products, it can be cumbersome to support and maintain multiple client access configurations. The DV server can act as a single access point for multiple vendor products (a single ODBC interface can replace drivers for each DBMS brand). Additionally, most DV Server drivers support multiple different access methods (ODBC, JDBC, XML, and web services).
Attribute level or value specific data security. This feature supports data security at a much lower granularity than is typically available with most DBMS products. Data can be protected (or restricted) at individual column values for entire table or selective rows.
Metadata tracking and management. Since Data Virtualization is a query-centric middleware environment, it only makes sense to position this server to retrieve, reconcile, and store metadata content from multiple, disparate data repositories.
Data lineage. This item works in tandem with the metadata capability and augments the information by retaining the source details for all data that is retrieved. This not only includes source id information for individual records but also the origin, creation date, and native attribute details.
Query tracking for usage audit. Because the DV Server can act as a centralized access point for user tool access, there are several DV products that support the capture and tracking of all submitted queries. This can be used to track, measure, and analyze end user (or repository) access.
Workflow linkage and processing. This is the ability to execute predefined logic against specific data that is retrieved. While this concept is similar to a macro or stored procedure, it’s much more sophisticated. It could include the ability to direct job control or specialized processing against an answer set prior to delivery (e.g. data hygiene, external access control, stewardship approval, etc.)
Packaged Application Templates. Most packaged applications (CRM, ERP, etc.) contain thousands of tables and columns that can be very difficult to understand and query. Several DV vendors have developed templates containing predefined DV server views that access the most commonly queried data elements.
Setup and Configuration Wizards. Configuring a DV server to access the multiple data sources can be a very time consuming exercise; the administrator needs to define and configure every source repository, the underlying tables (or files), along with the individual data fields. To simplify setup, a configuration wizard reviews the dictionary of an available data source and generates the necessary DV Server configuration details. It further analyzes the table and column names to simplify naming conventions, joins, and data value conversion and standardization details.
Don’t be misled into thinking that Data Virtualization is a highly mature product space where all of the products are nearly identical. They aren’t. Most product vendors spend more time discussing their unique features instead of offering metrics about their their core features. It’s important to remember that every Data Virtualization product requires a server that retrieves and processes data to fulfill query requests. This technology is not a commodity, which means that details like setup/configuration time, query performance, and advanced features can vary dramatically across products. Benchmark and test drive the technology before buying.
It’s fairly common for companies to assign Executive Sponsors to their large projects. “Large” typically reflects budget size, the inclusion of cross-functional teams, business impact, and complexity. The Executive Sponsor isn’t the person running and directing the project on a day-to-day basis; they’re providing oversight and direction. He monitors project progress and ensures that tactics are carried out to support the project’s goals and objectives. He has the credibility (and authority) to ensure that the appropriate level of attention and resources are available to the project throughout its entire life.
While there’s nearly universal agreement on the importance of an Executive Sponsor, there seems to be limited discussion about the specifics of the role. Most remarks seem to dwell on the importance on breaking down barriers, dealing with roadblocks, and effectively reacting to project obstacles. While these details make for good PowerPoint presentations, project success really requires the sponsor to exhibit a combination of skills beyond negotiation and problem resolution to ensure project success. Here’s my take on some of the key responsibilities of an Executive Sponsor.
Inspire the Stakeholder Audience
Most executives are exceptional managers that understand the importance of dates and budgets and are successful at leading their staff members towards a common goal. Because project sponsors don’t typically have direct management authority over the project team, the methods for leadership are different. The sponsor has to communicate, captivate, and engage with the team members throughout all phases of the project. And it’s important to remember that the stakeholders aren’t just the individual developers, but the users and their management. In a world where individuals have to juggle multiple priorities and projects, one sure-fire way to maintain enthusiasm (and participation) is to maintain a high-level of sponsor engagement.
Understand the Project’s Benefits
Because of the compartmentalized structure of most organizations, many executives aren’t aware of the details of their peer organizations. Enterprise-level projects enlist an Executive Sponsor to ensure that the project respects (and delivers) benefits to all stakeholders. It’s fairly common that any significantly sized project will undergo scope change (due to budget challenges, business risks, or execution problems). Any change will likely affect the project’s deliverables as well as the perceived benefits to the different stakeholders. Detailed knowledge of project benefits is crucial to ensure that any change doesn’t adversely affect the benefits required by the stakeholders.
Know the Project’s Details
Most executives focus on the high-level details of their organization’s projects and delegate the specifics to the individual project manager. When projects cross organizational boundaries, the executive’s tactics have to change because of the organizational breadth of the stakeholder community. Executive level discussions will likely cover a variety of issues (both high-level and detailed). It’s important for the Executive Sponsor to be able to discuss the brass tacks with other executives; the lack of this knowledge undermines the sponsor’s credibility and project’s ability to succeed.
Hold All Stakeholders Accountable
While most projects begin with everyone aligned towards a common goal and set of tactics, it’s not uncommon for changes to occur. Most problems occur when one or more stakeholders have to adjust their activities because of an external force (new priorities, resource contention, etc.). What’s critical is that all stakeholders participate in resolving the issue; the project team will either succeed together or fail together. The sponsor won’t solve the problem; they will facilitate the process and hold everyone accountable.
Stay Involved, Long Term
The role of the sponsor isn’t limited to supporting the early stages of a project (funding, development, and deployment); it continues throughout the life of the project. Because most applications have a lifespan of no less than 7 years, business changes will drive new business requirements that will drive new development. The sponsor’s role doesn’t diminish with time – it typically expands.
The overall responsibility set of an Executive Sponsor will likely vary across projects. The differences in project scope, company culture, business process, and staff resources across individual projects inevitably affect the role of the Executive Sponsor. What’s important is that the Executive Sponsor provides both strategic and tactical support to ensure a project is successful. An Executive Sponsor is more than the project’s spokesperson; they’re the project CEO that has equity in the project’s outcome and a legitimate responsibility for seeing the project through to success.
Photo “American Alligator Crossing the Road at Canaveral National Seashore”courtesy of Photomatt28 (Matthew Paulson) via Flickr (Creative Commons license).
One of the challenges in delivering successful data-centric projects (e.g. analytics, BI, or reporting) is realizing that the definition of project success differs from traditional IT application projects. Success for a traditional application (or operational) project is often described in terms of transaction volumes, functional capabilities, processing conformance, and response time; data project success is often described in terms of business process analysis, decision enablement, or business situation measurement. To a business user, the success of a data-centric project is simple: data usability.
It seems that most folks respond to data usability issues by gravitating towards a discussion about data accuracy or data quality; I actually think the more appropriate discussion is data knowledge. I don’t think anyone would argue that to make data-enabled decisions, you need to have knowledge about the underlying data. The challenge is understanding what level of knowledge is necessary. If you ask a BI or Data Warehouse person, their answer almost always includes metadata, data lineage, and a data dictionary. If you ask a data mining person, they often just want specific attributes and their descriptions — they don’t care about anything else. All of these folks have different views of data usability and varying levels (and needs) for data knowledge.
One way to improve data usability is to target and differentiate the user audience based on their data knowledge needs. There are certainly lots of different approaches to categorizing users; in fact, every analyst firm and vendor has their own model to describe different audience segments. One of the problems with these types of models is that they tend to focus heavily on the tools or analytical methods (canned reports, drill down, etc.) and ignore the details of data content and complexity. The knowledge required to manipulate a single subject area (revenue or customer or usage) is significantly less than the skills required to manipulate data across 3 subject areas (revenue, customer, and usage). And what exacerbates data knowledge growth is the inevitable plethora of value gaps, inaccuracies, and inconsistencies associated with the data. Data knowledge isn’t just limited to understanding the data; it includes understanding how to work around all of the imperfections.
Here’s a model that categories and describes business users based on their views of data usability and their data knowledge needs
Level 1: “Can you explain these numbers to me?”
This person is the casual data user. They have access to a zillion reports that have been identified by their predecessors and they focus their effort on acting on the numbers they get. They’re not a data analyst – their focus is to understand the meaning of the details so they can do their job. They assume that the data has been checked, rechecked, and vetted by lots of folks in advance of their receiving the content. They believe the numbers and they act on what they see.
Level 2: “Give me the details”
This person has been using canned reports, understands all the basic details, and has graduated to using data to answer new questions that weren’t identified by their predecessors. They need detailed data and they want to reorganize the details to suit their specific needs (“I don’t want weekly revenue breakdowns – I want to compare weekday revenue to weekend revenue”). They realize the data is imperfect (and in most instances, they’ll live with it). They want the detail.
Level 3: “I don’t believe the data — please fix it”
These folks know their area of the business inside/out and they know the data. They scour and review the details to diagnose the business problems they’re analyzing. And when they find a data mistake or inaccuracy, they aren’t shy about raising their hand. Whether they’re a data analyst that uses SQL or a statistician with their favorite advanced analytics algorithms, they focus on identifying business anomalies. These folks are the power users that are incredibly valuable and often the most difficult for IT to please.
Level 4: “Give me more data”
This is subject area graduation. At this point, the user has become self-sufficient with their data and needs more content to address a new or more complex set of business analysis needs. Asking for more data – whether a new source or more detail – indicates that the person has exhausted their options in using the data they have available. When someone has the capacity to learn a new subject area or take on more detailed content, they’re illustrating a higher level of data knowledge.
One thing to consider about the above model is that a user will have varying data knowledge based on the individual subject area. A marketing person may be completely self-sufficient on revenue data but be a newbie with usage details. A customer support person may be an expert on customer data but only have limited knowledge of product data. You wouldn’t expect many folks (outside of IT) to be experts on all of the existing data subject areas. Their knowledge is going to reflect the breadth of their job responsibilities.
As someone grows and evolves in business expertise and influence, it’s only natural that their business information needs would grow and evolve too. In order to address data usability (and project success), maybe it makes sense to reconsider the various user audience categories and how they are defined. Growing data knowledge isn’t about making everyone data gurus; it’s about enabling staff members to become self-sufficient in their use of corporate data to do their jobs.
Photo “Ladder of Knowledge” courtesy of degreezero2000 via Flickr (Creative Commons license).
Companies spend a small fortune continually investing and reinvesting in making their business analysts self-sufficient with the latest and greatest analytical tools. Most companies have multiple project teams focused on delivering tools to simplify and improve business decision making. There are likely several standard tools deployed to support the various data analysis functions required across the enterprise: canned/batch reports, desktop ad hoc data analysis, and advanced analytics. There’s never a shortage of new and improved tools that guarantee simplified data exploration, quick response time, and greater data visualization options, Projects inevitably include the creation of dozens of prebuilt screens along with a training workshop to ensure that the users understand all of the new whiz bang features associated with the latest analytic tool incarnation. Unfortunately, the biggest challenge within any project isn’t getting users to master the various analytical functions; it’s ensuring the users understand the underlying data they’re analyzing.
If you take a look at the most prevalent issue with the adoption of a new business analysis tool is the users’ knowledge of the underlying data. This issue becomes visible with a number of common problems: the misuse of report data, the misunderstanding of business terminology, and/or the exaggeration of inaccurate data. Once the credibility or usability of the data comes under scrutiny, the project typically goes into “red alert” and requires immediate attention. If ignored, the business tool quickly becomes shelfware because no one is willing to take a chance on making business decisions based on risky information.
All too often the focus on end user training is tool training, not data training. What typically happens is that an analyst is introduced to the company’s standard analytics tool through a “drink from a fire hose” training workshop. All of the examples use generic sales or HR data to illustrate the tool’s strengths in folding, spindling, and manipulating the data. And this is where the problem begins: the vendor’s workshop data is perfect. There’s no missing or inaccurate data and all of the data is clearly labeled and defined; classes run smoothly, but it just isn’t reality Somehow the person with no hands-on data experience is supposed to figure out how to use their own (imperfect) data. It’s like someone taking their first ski lesson on a cleanly groomed beginner hill and then taking them up to the top of an a black diamond (advanced) run with step hills and moguls. The person works hard but isn’t equipped to deal with the challenges of the real world. So, they give up on the tool and tell others that the solution isn’t usable.
All of the advanced tools and manipulation capabilities don’t do any good if the users don’t understand the data. There are lots of approaches to educating users on data. Some prefer to take a bottom-up approach (reviewing individual table and column names, meanings, and values) while others want to take a top-down approach (reviewing subject area details, the associated reports, and then getting into the data details). There are certainly benefits of one approach over the other (depending on your audience); however, it’s important not to lose sight of the ultimate goal: giving the users the fundamental data knowledge they need to make decisions. The fundamentals that most users need to understand their data include a review of
- the business subject area associated with their dat
- business terms, definitions, and their associated data attributes
- data values and their representations
- business rules and calculations associated with the individual values
- the data’s origin (a summary of the business processes and source system)
The above details may seem a bit overwhelming if you consider that most companies have mature reporting environments and multi-terabyte data warehouses. However, we’re not talking about training someone to be an expert on 1000 data attributes contained within your data warehouse; we’re talking about ensuring someone’s ability to use an initial set of reports or a new tool without requiring 1-on-1 training. It’s important to realize that the folks with the greatest need for support and data knowledge are the newbies, not the experienced folks.
There are lots of options for imparting data knowledge to business users: a hands-on data workshop, a set of screen videos showing data usage examples, or a simple set of web pages containing definitions, textual descriptions, and screen shots. Don’t get wrapped up in the complexities of creating the perfect solution – keep it simple. I worked with a client that deployed their information using a set of pages constructed with PowerPoint that folks could reference in a the company’s intranet. If your users have nothing – don’t’ worry about the perfect solution – give them something to start with that’s easy to use.
Remember that the goal is to build users’ data knowledge that is sufficient to get them to adopt and use the company’s analysis tools. We’re not attempting to convert everyone into data scientists; we just want them to use the tools without requiring 1-on-1 training to explain every report or data element.
Photo courtesy of NASA. Nasa Ames Research Center engineer H Julian “Harvey” Allen illustrating data knowledge (relating to capsule design for the Mercury program)
Maybe it’s time to challenge the 20 year-old paradigm of making everyone a knowledge worker. For a long time the BI community has assumed that if we give business users the right data and tools, they’ll have the necessary ammunition to do their jobs. But I’m beginning to believe that may no longer be a practical approach. At least not for everyone.
One thing that’s changed in the last dozen-or-so years is that individuals’ job responsibilities have become more complex. The breadth of these responsibilities has grown. I question whether the average business user can really keep track of all the subject area content, all the table definitions, column names, data types, definitions of columns, and locations of all the values across the 6000+ tables in the data mart.
And that’s just the data mart. I’m not even including the applications and systems the average business user interacts with on a daily basis. Not to mention all those presentations, documents, videos, and archived e-mails from customers.
I’m not arguing the value of analytics, nor am I challenging the value of the data warehouse. But is it really practical to expect everyone to generate their own reports? Look at the U.S. tax code. It’s certainly broader than a single CPA can keep track of. Now consider most companies’ Finance departments. There’s more data coming out of Finance than most people can deal with. Otherwise all those specialized applications and dedicated data analysts wouldn’t exist in the first place!
Maybe it’s not about delivering BI tools to every end-user. Maybe it’s about delivering reports in a manner that can be consumed. We’ve gotten so wound-up about detailed data that we haven’t stopped to wonder whether it’s worthwhile to push all that detail to the end-user’s desktop—and then expect him or her to learn all the rules.
One of my brokerage accounts contains 5 different equities. I don’t look at them every day. I don’t look at intra-day price changes. I really don’t need to know. All I really want to know is when I do look at the information, has the stock’s value gone up or down? And how do I get the information? I didn’t build a custom report. I didn’t do drill-down, or drill-across. I went to the web and searched on the stock price.
Maybe instead of buying of a copy of a [name the BI vendor software] tool, we simply build a set of standard reports for key business areas (Sales, Marketing, Finance), and publish them. You can publish these reports to a drive, to a server, to a website, to a portal—it shouldn’t matter. People should find the information with a browser. Reports can be stored and indexed and accessed via an enterprise search engine. Of course, as with everything else, you still need to define terms and metadata so that people understand what they’re reading.
Whenever people talk about enterprise search functionality they’re usually obsessing about unstructured data. But enterprise search can deliver enormous value for structured data. IT departments could be leading the charge if the definition of success weren’t large infrastructure and technology implementation projects and instead data delivery and usage.
The executive doesn’t ask, “What tool did you use to solve this problem?” Instead, she wants to know if the problem has in fact been solved.
The issue of Total Cost of Ownership (TCO) seems to come and go every few years. The need for it tends to ebb and flow with corporate budget cycles. TCO is perfectly fine for well-understood commodity functions or defined business processes. If I have to replace a server or a printer, or change a business process, TCO is a perfectly rational metric for comparing different alternatives.
When TCO calculations work, they tend to roll up within a single organization or manager. The hardware, the software, the installation, and the maintenance are under the domain of a single organization that covers the direct cost.
The problem with TCO arises when it’s used as a metric for justifying cross-functional or analytical systems. With these systems, the value isn’t delivering commodity processing but rather supporting decision making. TCO focuses on construction and maintenance costs. But for analytical systems, usage occurs across different organizations and varies with business value and need. TCO can in fact be misapplied.
At a simple level, TCO is often limited to processing hardware, storage, software, and IT resources necessary to configure and manage the platform on an ongoing basis. But this is usually limited to IT staff focused on system development and maintenance. Unfortunately the most expensive cost—not normally included in TCO calculations—is the business user’s time. While TCO quantifies costs for a data warehouse developer, there is no clear way to calculate costs for the dozens or hundreds of business users who are actually analyzing data and creating reports every day. The reality of analytical systems is that development continues every day on the business side.
Nevertheless it’s common for TCO calculations to be reduced to the cost of processing or storage, rather than reflecting the exponential costs of users circumventing slow-running queries and inaccurate data. At the end of the day, TCO shouldn’t only be about the cost of hardware and software installation and maintenance. It should be about the cost of continued business usage.
photo by -Luz- via Flickr (Creative Commons license)
Back when I was applying to college, I’d read over college catalogs. Inevitably, each university would mention the number of books it had in its library. When I finally went to college, I realized that this metric was fairly meaningless. A dozen volumes on Grecian pottery did me no good when I was in search of a book on polymers for my mechanical engineering class.
Clients will often ask us to scope a “data inventory” project, inevitably focused on identifying and describing all the data elements contained across their different application systems. Recently a new CIO asked us to head up a “tiger team” to inventory his company’s data. He was surprised at the quantity of information needs that had been sent his way. As expected, he inquired about systems of record and data dictionaries. As you can imagine, he received multiple and conflicting answers which only exacerbated his confusion.
As a point of reference, well-known ERP systems can have in excess of 50,000 discrete data elements in their databases (never mind that some aren’t in English). As I’ve written in the past, many of these data elements have no use outside of the application itself.
Having terabyte upon terabyte of information is equally irrelevant if that data is unrelated to current business issues. The problem with a data inventory activity is that identifying and counting data elements in different systems and applications won’t necessarily solve any problems. Why? Because data across applications and packages is inconsistent: there are different names, definitions, and values, and there is no practical means of determining which data they actually have in common. This is like going to the hardware store and looking for a specific screw, but all the different screws are in one big barrel—you end up having to pick through each screw, one at time. When you find the screw, you just throw all the other screws back into the barrel.
The point of a data inventory isn’t to pick through data because it exists, but to inventory the data people actually need. If you’re going to undertake a data inventory, your output should be structured so that the next person doesn’t have to repeat your work. Identify the data that is moving across various systems, as this indicates key information that’s being shared. Categorize this data by subject area. You’ll inevitably find that there are inconsistent versions of the data, enabling you to identify data disparities. You can then begin to develop a catalog of key corporate data that will form the basis of your data dictionary.
Inventorying the data that moves between systems accomplishes two things: it identifies the most valuable data elements in use, and it will also help identify data that’s not high-value, as it’s not being shared or used. This approach also provides a way to tackle initial data quality efforts by identifying the most “active” data used by the business. It ultimately helps the data management team understand where to focus its efforts, and prioritize accordingly.
So next time someone suggests a data inventory without context or objectives, consider sending them to college to study Grecian urns.
There are far too many data warehouse development teams solely focused on loading data. They’ve completely lost sight of their success metrics.
Why have they fallen into this rut? Because they’re doing what they’ve always done. One of the challenges in data warehousing is that as time progresses the people on the data warehouse development team are often not the same people who launched the team. This erosion of experience has eroded the original vision and degraded the team’s effectiveness .
One client of mine actually bonused their data warehouse development team based on system usage and capacity. Was there a lot of data in the data warehouse? Yep. Were there multiple sandboxes, each with its own copy of data? Yep. Was this useful three years ago? Yep. Does any of this matter now? Nope. The original purpose of the data warehouse—indeed, the entire BI program—has been forgotten.
In the beginning the team understood how to help the business. They were measured on business impact. Success was based on new revenues and lower costs outside of IT. The team understood the evolution of the applications and data to support BI was critical in continually delivering business value. There was an awareness of what was next. Success was based on responding to the new business need. Sometimes this meant reusing data with new reports, sometimes it meant new data, sometimes it was just adjusting a report. The focus was on aggressively responding to business change and the resulting business need.
How does your BI team support decision making? Does it still deliver value to business users? Maybe your company is like some of the companies that I’ve seen: the success of the data warehouse and the growth of its budget propelled it into being managed like an operational system. People have refocused their priorities to managing loads, monitoring jobs, and throwing the least-expensive, commodity skills at the program. So a few years after BI is introduced, the entire program has become populated with IT order-takers, watching and managing extracts, load jobs, and utilization levels.
Then an executive asks: “Why is this data warehouse costing us so much?”
You’ve built applications, you’ve delivered business value, and you’ve managed your budget. Good for you. But now you have to do more. IT’s definition of data warehouse success is you cutting your budget. Why? Because IT’s definition of success isn’t business value creation, it’s budget conformance.
Because BI isn’t focused on business operation automation, as with many operational systems, it can’t thrive in a maintenance-driven mode. In order to continue to support the business, BI must continually deliver new information and new functionality. Beware the IT organization that wants to migrate the data warehouse to an operational support model measured on budgets, not business value. This can jeopardize more than just your next platform upgrade, it can imperil the BI program itself. The tunnel-vision of Service Level Agreements, manpower estimates, and project plan maintenance aren’t doing you any favors. They can’t be done devoid of business drivers.
When there are new business needs, business users may try to enlist IT resources to support them. But they no longer see partners who will help realize their visions and deliver valuable analytics. They see a few low-cost, less experienced technicians monitoring system uptime and staring at the blinking lights.
photo by jurvetson via Flickr (Creative Commons License)