The 5 Components of a Data Strategy
Because the idea of building a data strategy is a fairly new concept in the world of business and information technology (IT), there’s a fair amount of discussion about the pieces and parts that comprise a Data Strategy. Most IT organizations have invested heavily in developing plans to address platforms, tools, and even storage. Those IT plans are critical in managing systems and capturing and retaining content generated by a company’s production applications. Unfortunately, those details don’t typically address all of the data activities that occur after an application has created and processed data from the initial business process. The reasons that folks take on the task of developing a Data Strategy is because of the challenges in finding, identifying, sharing, and using data. In any company, there are numerous roles and activities involved in delivering data to support business processing and analysis. A successful Data Strategy must support the breadth of activities necessary to ensure that data is “ready to use”.
There are five core components in a data strategy that work together as building blocks to address the various details necessary to comprehensively support the management and usage of data.
Identify The ability to identify data and understand its meaning regardless of its structure, origin, or location.
This concept is pretty obvious, but it’s likely one of the biggest obstacles in data usage and sharing. All too often, companies have multiple and different terms for specific business details (customer: account, client, patron; income: earnings, margin, profit). In order to analyze, report, or use data, people need to understand what it’s called and how to identify it. Another aspect of Identify is establishing the representation of the data’s value (Are the company’s geographic locations represented by name, number, or an abbreviation?) A successful Data Strategy would identify the gaps and needs in this area and identify the necessary activities and artifacts required to standardize data identification and representation.
Provision Enabling data to be packaged and made available while respecting all rules and access guidelines.
Data is often shared or made available to others at the convenience of the source system’s developers. The data is often accessible via database queries or as a series of files. There’s rarely any uniformity across systems or subject areas, and usage requires programming level skills to analyze and inventory the contents of the various tables or files. Unfortunately, the typical business person requiring data is unlikely to possess sophisticated programming and data manipulation skills. They don’t want raw data (that reflects source system formats and inaccuracies), they want data that is uniformly formatted and documented that is ready to be added to their analysis activities.
The idea of Provision is to package and provide data that is “ready to use”. A successful Data Strategy would identify the various data sharing needs and identify the necessary methods, practices, and tooling required to standardize data packaging and sharing.
Store Persisting data in a structure and location that supports access and processing across the enterprise.
Most IT organizations have solid plans for addressing this area of a Data Strategy. It’s fairly common for most companies to have a well-defined set of methods to determine the platform where online data is stored and processed, how data is archived for disaster recovery, and all of the other details such as protection, retention, and monitoring.
As the technology world has evolved, there are other facets of this area that require attention. The considerations include managing data distributed across multiple locations (the cloud, premise systems, and even multiple desktops), privacy and protection, and managing the proliferation of copies. With the emergence of new consumer privacy laws, it’s risky to store multiple copies of data, and it’s become necessary to track all existing copies of content. A successful Data Strategy ensures that any created data is always available for future access without requiring everyone to create their own copy.
Assemble Standardizing, combining, and moving data residing in multiple locations and providing a unified view.
It’s no secret that data integration is one of the more costly activities occurring within an IT organization; nearly 40% of the cost of new development is consumed by data integration activities. And Assemble isn’t limited to integration, it also includes correcting, standardizing, and formatting the content to make it “ready to use”.
With the growth of analytics and desktop decisioning making, the need to continually analyze and include new data sets into the decision-making process has exploded. Processing (or preparing or wrangling) data is no longer confined to the domain of the IT organization, it has become an end user activity. A successful Data Strategy had to ensure that all users can be self-sufficient in their abilities to process data.
Govern Establishing and communicating information rules, policies, and mechanisms to ensure effective data usage.
While most organizations are quick to identify their data as a core business asset, few have put the necessary rigor in place to effectively manage data. Data Governance is about establishing rules, policies, and decision mechanisms to allow individuals to share and use data in a manner that respects the various (legal and usage) guidelines associated with that data. The inevitable challenge with Data Governance is adoption by the entire data supply chain – from application developers to report developers to end users. Data Governance isn’t a user-oriented concept, it’s a data-oriented concept. A successful Data Strategy identifies the rigor necessary to ensure a core business asset is managed and used correctly.
The 5 Components of a Data Strategy is a framework to ensure that all of a company’s data usage details are captured and organized and that nothing is unknowingly overlooked. A successful Data Strategy isn’t about identifying every potential activity across the 5 different components. It’s about making sure that all of the identified solutions to the problems in accessing, sharing, and using data are reviewed and addressed in a thorough manner.
Do You Need A Data Strategy?
During my time teaching Data Strategy in the class room, I’m frequently asked the question, “how do I know if I need a data strategy?” For those of you that are deep thinkers, business strategists, or even data architects, I suspect your answer is either “yes!” or “why not?”.
When I’m asked that question, I actually think there’s a different question at hand, “Should I invest the time in developing a data strategy instead of something else?”
In today’s business world, there’s not a shortage of “to do list” items. So, prioritizing the development of a Data Strategy means deprioritizing some other item. In order to understand the relative priority and benefit of a Data Strategy initiative, take a look at the need, pain, or problem you’re addressing along with the quantity of people affected. Your focus should be understanding how a Data Strategy initiative will benefit the team members’ ability to do their job.
To get started, I usually spend time up front interviewing folks to understand the strengths, weaknesses, challenges, and opportunities that exist with data within a company (or organization). Let me share 5 questions that I always ask.
- Is the number of users (or organizations) building queries/reports to analyze data growing?
- Are there multiple reports containing conflicting information?
- Can a new staff member find and use data on their own, or does it require weeks or months of staff mentoring?
- Is data systematically inspected for accuracy (and corrected)? Is anyone responsible for fixing “broken data”?
- Is anyone responsible for data sharing?
While you might think these questions are a bit esoteric, each one has a specific purpose. I’m a big fan of positioning any new strategy initiative to clearly identify the problems that are going to be solved. If you’re going to undertake the development of a Data Strategy, you want to make certain that you will improve staff members’ ability to make decisions and be more effective at their jobs. These questions will help you identify where people struggle getting the job done, or where there’s an unquantified risk with using data to make decisions.
So, let me offer an explanation of each question.
- “Is the number of users (or organizations) building queries/reports to analyze data growing”
The value of a strategy is directly proportional to the number of people that are going to be affected. In the instance of a data strategy, it’s valuable to understand the number of people that use data (hands-on) to make decisions or do their jobs. If the number is small or decreasing, a strategy initiative may not be worth the investment in time and effort. The larger the number, the greater the impact to the effectiveness (and productivity) to the various staff members.
- “Are there multiple reports containing conflicting information? “
If you have conflicting details within your company that means decisions are made with inaccurate data. That also means that there’s mistrust of information and team members are spending time confirming details. That’s business risk and a tremendous waste of time.
- “Can a new staff member find and use data…”
If a new staff member can’t be self-sufficient after a week or two on the job (when it comes to data access and usage), you have a problem. That’s like someone joining the company and not having access to office supplies, a parking space, and email. And, if the only way to learn is to beg for time for other team members – your spending time with two people not doing their job. It’s a problem that’s being ignored.
- “Is data systematically inspected for accuracy (and corrected)? …”
This item is screaming for attention. If you’re in a company that uses data to make decisions, and no one is responsible for inspecting the content, you have a problem. Think about this issue another way: would you purchase hamburger at the grocery store if there was a sign that stated “Never inspected. May be spoiled. Not our responsibility”?
- Is anyone responsible for data sharing?
This item gets little attention in most companies and is likely the most important of all the questions. If data is a necessary ingredient in decision making and there isn’t anyone actively responsible for ensuring that new data assets are captured, stored, tracked, managed, and shared, you’re saying that data isn’t a business asset. (How many assets in the company aren’t tied to someone’s responsibilities?)
If the answer to all of the questions is “no” – great. You’re in an environment where data is likely managed in a manner that supports a multitude of team members’ needs across different organizations. If you answered “yes” to a single question, it’s likely that an incremental investment in a tactical data management effort would be helpful. If more than 1 question is answered “yes”, your company (and the team) will benefit from a Data Strategy initiative.
Data Strategy. Why it Matters
I’ve been consulting in the data management space for quite a few years, and I’m often asked about the importance and need for a Data Strategy.
All too often, the idea of “strategy” brings the images of piles of papers, academics-styled charts, and a list of unachievable goals identifying the topic at hand, but not reflecting reality. Developing a strategy isn’t about identifying perfection – it’s about identifying a set of goals that address problems and needs that require attention. A solid data strategy isn’t about identifying perfection, it’s about identifying a set of goals that are achievable and good enough to improve your data environment. A data strategy is also about identifying the tasks and activities necessary to achieve those goals. A data strategy is more than the finish line, it’s about the path of the journey. And, it’s about making sure the journey and goal are possible.
Companies spend a fortune on data. They purchase servers and storage farms to store the data, database management systems to manage the data, transformation tools to convert and transform the data, data quality tools to fix and standardize the content, and treasure trove of analytical tools to present content that can be understood by business people. Given all of the activities, the players, and the content, why would you not want a plan?
Unfortunately, few organizations have a Data Strategy. They have lots of technology plans and roadmaps. They have platform and server plans; they have DBMS standards; they have storage strategies; they likely have analytical tool plans. While these are valuable, they are typically focused on an organization or function with minimal concern for all of the related upstream and downstream activities (how usable is a data warehouse if the data exists as multiple copies with different names and different formats, and hasn’t been checked/fixed for accuracy?) A data strategy is a plan that ensures that data is easy to find, easy to identify, easy to use, and easy to share across the company and across multiple functions.
Information technologists are exceptionally strong in the world of applications, tools, and platforms. They understand the importance of ensuring “reusability” and the benefit of an “economies-of-scale” approach. These are both just nice sound bites focused on making sure that new development work doesn’t always require reinvention. Application strategies include identifying standards (tools, platforms, storage locations, etc.) and repeatable methods to ensure efficient construction and delivery of data that can be serviced, maintained, and upgraded. An assembly line of sorts.
The challenge with most data environments is that a data strategy rarely exists; there is no repeatable methods and practices. Every new request requires building data and the associated deliverables from scratch. And, once delivered, there’s a huge testing and confirmation effort to ensure that the data is accurate. If you had a data strategy, you’d have reusable data, repeatable methods, and the details would be referenceable online instead of through tribal knowledge. And delivery efficiency and cost would improve over time.
Why do you need a data strategy? Because the cost of data is growing –and it should be shrinking. The cost of data processing has shrunk, the cost of data storage has decreased dramatically, but the cost of data delivery continues to grow. A data strategy focuses on delivering data that is easy to find, easy to use, and easy to share.
What is a Data Strategy?
A simple definition of Data Strategy is
“ A plan designed to improve all of the ways you acquire, store, manage, share, and use data”
Over the years, most companies have spent a fortune on their data. They have a bunch of folks that comprise their “center of expertise”, they’ve invested lots of money in various data management tools (ETL-extract/transformation/load, metadata, data catalogs, data quality, etc.), and they’ve spent bazillions on storage and server systems to retain their terabytes or petabytes of data. And what you often find is a lot of disparate (or independent) projects building specific deliverables for individual groups of users. What you rarely find is a plan that addresses all of the disparate user needs that to support their ongoing access, sharing, use of data.
While most companies have solid platform strategies, storage strategies, tool strategies, and even development strategies, few companies have a data strategy. The company has technology standards to ensure that every project uses a specific brand of server, a specific set of application development tools, a well-defined development method, and specific deliverables (requirements, code, test plan, etc.) You rarely find data standards: naming conventions and value standards, data hygiene and correction, source documentation and attribute definitions, or even data sharing and packaging conventions. The benefit of a Data Strategy is that data development becomes reusable, repeatable, more reliable, faster. Without a data strategy, the data activities within every project are always invented from scratch. Developers continually search and analyze data sources, create new transformation and cleansing code, and retest the same data, again, and again, and again.
The value of a Data Strategy is that it provides a roadmap of tasks and activities to make data easier to access, share, and use. A Data Strategy identifies the problems and challenges across multiple projects, multiple teams, and multiple business functions. A Data Strategy identifies the different data needs across different projects, teams, and business functions. A Data Strategy identifies the various activities and tasks that will deliver artifacts and methods that will benefit multiple projects, teams and business functions. A Data Strategy delivers a plan and roadmap of deliverables that ensures that data across different projects, multiple teams, and business functions are reusable, repeatable, more reliable, and delivered faster.
A Data Strategy is a common thread across both disparate and related company projects to ensure that data is managed like a business asset, not an application byproduct. It ensures that data is usable and reusable across a company. A Data Strategy is a plan and road map for ensuring that data is simple to acquire, store, manage, share, and use.
Shadow IT: Déjà Vu All Over Again
I’m a bit surprised with all of the recent discussion and debate about Shadow IT. For those of you not familiar with the term, Shadow IT refers to software development and data processing activities that occur within business unit organizations without the blessing of the Central IT organization. The idea of individual business organizations purchasing technology, hiring staff members, and taking on software development to address specific business priorities isn’t a new concept; it’s been around for 30 years.
When it comes to the introduction of technology to address or improve business process, communications, or decision making, Central IT has traditionally not been the starting point. It’s almost always been the business organization. Central IT has never been in the position of reengineering business processes or insisting that business users adopt new technologies; that’s always been the role of business management. Central IT is in the business of automating defined business processes and reducing technology costs (through the use of standard tools, economies-of-scale methods, commodity technologies). It’s not as though Shadow IT came into existence to usurp the authority or responsibilities of the IT organization. Shadow IT came into existence to address new, specialized business needs that the Central IT organization was not responsible for addressing.
Here’s a few examples of information technologies that were introduced and managed by Shadow IT organizations to address specialized departmental needs.
- Word Processing. Possibly the first “end user system” (Wang, IBM DisplayWrite, etc.) This solution was revolutionary in reducing the cost of documentation
- The minicomputer. This technology revolution of the 70’s and 80’s delivered packaged, departmental application systems (DEC, Data General, Prime, etc.) The most popular were HR, accounting, and manufacturing applications.
- The personal computer. Many companies created PC support teams (in Finance) because they required unique skills that didn’t exist within most companies.
- Email, File Servers, and Ethernet (remember Banyan, Novell, 3com). These tools worked outside the mainframe OLTP environment and required specialized skills.
- Data Marts and Data Warehouses. Unless you purchased a product from IBM, the early products were often purchased and managed by marketing and finance.
- Business Intelligence tools. Many companies still manage analytics and report development outside of Central IT.
- CRM and ERP systems. While both of these packages required Central IT hardware platforms, the actual application systems are often supported by separate teams positioned within their respective business areas.
The success of Shadow IT is based on their ability to respond to specialized business needs with innovative solutions. The technologies above were all introduced to address specific departmental needs; they evolved to deliver more generalized capabilities that could be valued by the larger corporate audience. The larger audience required the technology’s ownership and support to migrate from the Shadow IT organization to Central IT. Unfortunately, most companies were ill prepared to support the transition of technology between the two different technology teams.
Most Central IT teams bristle at the idea of inheriting a Shadow IT project. There are significant costs associated with transitioning a project to a different team and a larger user audience. This is why many Central IT teams push for Shadow IT to adopt their standard tools and methods (or for the outright dissolution of Shadow IT). Unfortunately applying low-cost, standardized methods to deploy and support a specialized, high-value solution doesn’t work (if it did, it would have been used in the first place). You can’t expect to solve specialized needs with a one-size-fits-all approach.
A Shadow IT team delivers dozens of specialized solutions to their business user audience; the likelihood that any solution will be deployed to a larger audience is very small. While it’s certainly feasible to modify the charter, responsibilities, and success metrics of a Centralized IT organization to support both specialized unique and generalized high volume needs, I think there’s a better alternative: establish a set of methods and practices to address the infrequent transition of Shadow IT projects to Central IT. Both organizations should be obligated to work with and respond to the needs and responsibilities of the other technology team.
Most companies have multiple organizations with specific roles to address a variety of different activities. And organizations are expected to cooperate and work together to support the needs of the company. Why is it unrealistic to have Central IT and Shadow IT organizations with different roles to address the variety of (common and specialized) needs across a company?
Who Has My Personal Data?
In order to prepare for the cooking gauntlet that often occurs with the end of year holiday season, I decided to purchase a new rotisserie oven. The folks at Acme Rotisserie include a large amount of documentation with their rotisserie. I reviewed the entire pile and was a bit surprised by the warranty registration card. The initial few questions made sense: serial number, place of purchase, date of purchase, my home address. The other questions struck me as a bit too inquisitive: number of household occupants, household income, own/rent my residence, marital status, and education level. Obviously, this card was a Trojan horse of sorts; provide registration details –and all kinds of other personal information. They wanted me to give away my personal information so they could analyze it, sell it, and make money off of it.
Companies collecting and analyzing consumer data isn’t anything new –it’s been going on for decades. In fact, there are laws in place to protect consumer’s data in quite a few industries (healthcare, telecommunications, and financial services). Most of the laws focus on protecting the information that companies collect based on their relationship with you. It’s not the just details that you provide to them directly; it’s the information that they gather about how you behave and what you purchase. Most folks believe behavioral information is more valuable than the personal descriptive information you provide. The reason is simple: you can offer creative (and highly inaccurate) details about your income, your education level, and the car you drive. You can’t really lie about your behavior.
I’m a big fan of sharing my information if it can save me time, save me money, or generate some sort of benefit. I’m willing to share my waist size, shirt size, and color preferences with my personal shopper because I know they’ll contact me when suits or other clothing that I like is available at a good price. I’m fine with a grocer tracking my purchases because they’ll offer me personalized coupons for those products. I’m not okay with the grocer selling that information to my health insurer. Providing my information to a company to enhance our relationship is fine; providing my information to a company so they can share, sell, or otherwise unilaterally benefit from it is not fine. My data is proprietary and my intellectual property.
Clearly companies view consumer data to be a highly valuable asset. Unfortunately, we’ve created a situation where there’s little or no cost to retain, use, or abuse that information. As abuse and problems have occurred within certain industries (financial services, healthcare, and others), we’ve created legislation to force companies to responsibly invest in the management and protection of that information. They have to contact you to let you know they have your information and allow you to update communications and marketing options. It’s too bad that every company with your personal information isn’t required to behave in the same way. If data is so valuable that a company retains it, requiring some level of maintenance (and responsibility) shouldn’t be a big deal.
It’s really too bad that companies with copies of my personal information aren’t required to contact me to update and confirm the accuracy of all of my personal details. That would ensure that all of the specialized big data analytics that are being used to improve my purchase experiences were accurate. If I knew who had my data, I could make sure that my preferences were up to date and that the data was actually accurate.
It’s unfortunate that Acme Rotisserie isn’t required to contact me to confirm that I have 14 children, an advanced degree in swimming pool construction, and that I have Red Ferrari in my garage. It will certainly be interesting to see the personalized offers I receive for the upcoming Christmas shopping season.
Hadoop Replacing Data Warehouse Processing
I was recently asked about my opinion for the potential of Hadoop replacing a company’s data warehouse (DW). While there’s lots to be excited about when it comes to Hadoop, I’m not currently in the camp of folks that believe it’s practical to use Hadoop to replace a company’s DW. Most corporate DW systems are based on commercial relational database products and can store and manage multiple terabytes of data and support hundreds (if not thousands) of concurrent users. It’s fairly common for these systems to handle complex, mixed workloads –queries processing billions of rows across numerous tables along with simple primary key retrieval requests all while continually loading data. The challenge today is that Hadoop simply isn’t ready for this level of complexity.
All that being said, I do believe there’s a huge opportunity to use Hadoop to replace a significant amount of processing that is currently being handled by most DWs. Oh, and data warehouse user won’t be affected at all.
Let’s review a few fundamental details about the DW. There’s two basic data processing activities that occur on a DW: query processing and transformation processing. Query processing is servicing the SQL that’s submitted from all of the tools and applications on the users’ desktops, tablets, and phones. Transformation processing is the workload involved with converting data from their source application formats to the format required by the data warehouse. While the most visible activity to business users is query processing, it is typically the smaller of the two. Extracting and transforming the dozens (or hundreds) of source data files for the DW is a huge processing activity. In fact, most DWs are not sized for query processing; they are sized for the daily transformation processing effort.
It’s important to realize that one of the most critical service level agreements (SLAs) of a DW is data delivery. Business users want their data first thing each morning. That means the DW has to be sized to deliver data reliably each and every business morning. Since most platforms are anticipated to have a 3+ year life expectancy, IT has to size the DW system based on the worst case data volume scenario for that entire period (end of quarter, end of year, holidays, etc.) This means the DW is sized to address a maximum load that may only occur a few times during that entire period.
This is where the opportunity for Hadoop seems pretty obvious. Hadoop is a parallel, scalable framework that handles distributed batch processing and large data volumes. It’s really a set of tools and technologies for developers, not end users. This is probably why so many ETL (extract, transformation, and load) product vendors have ported their products to execute within a Hadoop environment. It only makes sense to migrate processing from a specialized platform to commodity hardware. Why bog down and over invest in your DW platform if you can handle the heavy lifting of transformation processing on a less expensive platform?
Introducing a new system to your DW environment will inevitably create new work for your DW architects and developers. However, the benefits are likely to be significant. While some might view such an endeavor as a creative way to justify purchasing new hardware and installing Hadoop, the real reason is to extend the life of the data warehouse (and save your company a bunch of money by deferring a DW upgrade)
My Dog Ate the Requirements, Part 2
There’s nothing more frustrating than not being able to rely upon a business partner. There’s lots of business books about information technology that espouses the importance of Business/IT alignment and the importance of establishing business users as IT stakeholders. The whole idea of delivering business value with data and analytics is to provide business users with tools and data that can support business decision making. It’s incredibly hard to deliver business value when half of the partnership isn’t stepping up to their responsibilities.
There’s never a shortage of rationale as to why requirements haven’t been collected or recorded. In order for a relationship to be successful, both parties have to participate and cooperate. Gathering and recording requirements isn’t possible if the technologist doesn’t meet with the users to discuss their needs, pains, and priorities. Conversely, the requirements process won’t succeed if the users won’t participate. My last blog reviewed the excuses that technologists offered for explaining the lack of documented requirements; this week’s blog focuses on remarks I’ve heard from business stakeholders.
- “I’m too busy. I don’t have time to talk to developers”
- “I meet with IT every month, they should know my requirements”
- “IT isn’t asking me for requirements, they want me to approve SQL”
- “We sent an email with a list of questions. What else do they need?”
- “They have copies of reports we create. That should be enough.”
- “The IT staff has worked here longer than I have. There’s nothing I can tell them that they don’t already know”
- “I’ve discussed my reporting needs in 3 separate meetings; I seem to be educating someone else with each successive discussion”
- “I seem to answer a lot of questions. I don’t ever see anyone writing anything down”
- “I’ll meet with them again when they deliver the requirements I identified in our last discussion.
- “I’m not going to sign off on the requirements because my business priorities might change – and I’ll need to change the requirements.
Requirements gathering is really a beginning stage for negotiating a contract for the creation and delivery of new software. The contract is closed (or agreed to) when the business stakeholders agree to (or sign-off on) the requirements document. While many believe that requirements are an IT-only artifact, they’re really a tool to establish responsibilities of both parties in the relationship.
A requirements document defines the data, functions, and capabilities that the technologist needs to build to deliver business value. The requirements document also establishes the “product” that will be deployed and used by the business stakeholders to support their business decision making activities. The requirements process holds both parties accountable: technologists to build and business stakeholders to use. When two organizations can’t work together to develop requirements, it’s often a reflection of a bigger problem.
It’s not fair for business stakeholders to expect development teams to build commercial grade software if there’s no participation in the requirements process. By the same token, it’s not right for technologists to build software without business stakeholder participation. If one stakeholder doesn’t want to participate in the requirements process, they shouldn’t be allowed to offer an opinion about the resulting deliverable. If multiple stakeholders don’t want to participate in a requirements activity, the development process should be cancelled. Lack of business stakeholder participation means they have other priorities; the technologists should take a hint and work on their other priorities.
My Dog Ate the Requirements
I received a funny email the other day about excuses that school children use to explain why they haven’t done their homework. The examples were pretty creative: “my mother took it to be framed”, “I got soap in my eyes and was blinded all night”, and (an oldie and a goody) –“my dog ate my homework”. It’s a shame that such a creative approach yielded such a high rate of failure. Most of us learn at an early age that you can’t talk your way out of failure; success requires that you do the work. You’d also think that as people got older and more evolved, they’d realize that there’s very few shortcuts in life.
I’m frequently asked to conduct best practice reviews of business intelligence and data warehouse (BI/DW) projects. These activities usually come about because either users or IT management is concerned with development productivity or delivery quality. The review activity is pretty straight forward; interviews are scheduled and artifacts are analyzed to review the various phases, from requirements through construction to deployment. It’s always interesting to look at how different organizations handle architecture, code design, development, and testing. One of the keys to conducting a review effort is to focus on the actual results (or artifacts) that are generated during each stage. It’s foolish to discuss someone’s development method or style prior to reviewing the completeness of the artifacts. It’s not necessary to challenge someone approach if their artifacts reflect the details required for the other phases.
And one of the most common problems that I’ve seen with BI/DW development is the lack of documented requirements. Zip – zero –zilch – nothing. While discussions about requirements gathering, interview styles, and even document details occur occasionally, it’s the lack of any documented requirements that’s the norm. I can’t imagine how any company allows development to begin without ensuring that requirements are documented and approved by the stakeholders. Believe it or not, it happens a lot.
So, as a tribute to the creative school children of yesterday and today, I thought I would devote this blog to some of the most creative excuses I’ve heard from development teams to justify their beginning work without having requirements documentation.
- “The project’s schedule was published. We have to deliver something with or without requirements”
- “We use the agile methodology, it’s doesn’t require written requirements”
- “The users don’t know what they want.”
- “The users are always too busy to meet with us”
- “My bonus is based on the number of new reports I create. We don’t measure our code against requirements”
- “We know what the users want, we just haven’t written it down”
- “We’ll document the requirements once our code is complete and testing finished”
- “We can spend our time writing requirements, or we can spend our time coding”
- “It’s not our responsibility to document requirements; the users need to handle that”
- “I’ve been told not to communicate with the business users”
Many of the above items clearly reflect a broken set of management or communication methods. Expecting a development team to adhere to a project schedule when they don’t have requirements is ridiculous. Forcing a team to commit to deliverables without requirements challenges conventional development methods and financial common sense. It also reflects leadership that focuses on schedules, utilization and not business value.
A development team that is asked to build software without a set of requirements is being set up to fail. I’m always astonished that anyone would think they can argue and justify that the lack of documented requirements is acceptable. I guess there are still some folks that believe they can talk their way out of failure.
Data Quality, Data Maintenance
I read an interesting tidbit about data the other day: the United States Postal Service processed more than 47 million changes of addresses in the last year. That’s nearly 1 in 6 people. In the world of data, that factoid is a simple example of the challenge of addressing stale data and data quality. The idea of stale data is that as data ages, its accuracy and associated business rules can change.
There’s lots of examples of how data in your data warehouse can age and degrade in accuracy and quality: people move, area codes change, postal/zip codes change, product descriptions change, and even product SKUs can change. Data isn’t clean and accurate forever; it requires constant review and maintenance. This shouldn’t be much of a surprise for folks that view data as a corporate asset; any asset requires ongoing maintenance in order to retain and ensure its value. The challenge with maintaining any asset is establishing a reasonable maintenance plan.
Unfortunately, while IT teams are exceptionally strong in planning and carrying out application maintenance, it’s quite rare that data maintenance gets any attention. In the data warehousing world, data maintenance is typically handled in a reactive, project-centric manner. Nearly every data warehouse (or reporting) team has to deal with data maintenance issues whenever a company changes major business processes or modifies customer or product groupings (e.g. new sales territories, new product categories, etc.) This happens so often, most data warehouse folks have even given it a name: Recasting History. Regardless of what you call it, it’s a common occurrence and there are steps that can be taken to simplify the ongoing effort of data maintenance.
- Establish a regularly scheduled data maintenance window. Just like the application maintenance world, identify a window of time when data maintenance can be applied without impacting application processing or end user access
- Collect and publish data quality details. Profile and track the content of the major subject area tables within your data warehouse environment. Any significant shift in domain values, relationship details, or data demographics can be discovered prior to a user calling to report an undetected data problem
- Keep the original data. Most data quality processing overwrites original content with new details. Instead, keep the cleansed data and place the original values at the end of your table records. While this may require a bit more storage, it will dramatically simplify maintenance when rule changes occur in the future
- Add source system identification and creation date/time details to every record. While this may seem tedious and unnecessary, these two fields can dramatically simplify maintenance and trouble shooting in the future
- Schedule a regular data change control meeting. This too is similar in concept to the change control meeting associated with IT operations teams. This is a forum for discussing data content issues and changes
Unfortunately, I often find that data maintenance is completely ignored. The problem is that fixing broken or inaccurate data isn’t sexy; developing a data maintenance plan isn’t always fun. Most data warehouse development teams are buried with building new reports, loading new data, or supporting the ongoing ETL jobs; they haven’t given any attention to the quality or accuracy of the actual content they’re moving and reporting. They simply don’t have the resources or time to address data maintenance as a proactive activity.
Business users clamor for new data and new reports; new funding is always tied to new business capabilities. Support costs are budgeted, but they’re focused on software and hardware maintenance activities. No one ever considers data maintenance; it’s simply ignored and forgotten.
Interesting that we view data as a corporate asset – a strategic corporate asset – and there’s universal agreement that hardware and software are simply tools to support enablement. And where are we investing in maintenance? The commodity tools, not the strategic corporate asset.
Photo courtesy of DesignzillasFlickr via Flickr (Creative Commons license).