I’ve been consulting in the data management space for quite a few years, and I’m often asked about the importance and need for a Data Strategy.
All too often, the idea of “strategy” brings the images of piles of papers, academics-styled charts, and a list of unachievable goals identifying the topic at hand, but not reflecting reality. Developing a strategy isn’t about identifying perfection – it’s about identifying a set of goals that address problems and needs that require attention. A solid data strategy isn’t about identifying perfection, it’s about identifying a set of goals that are achievable and good enough to improve your data environment. A data strategy is also about identifying the tasks and activities necessary to achieve those goals. A data strategy is more than the finish line, it’s about the path of the journey. And, it’s about making sure the journey and goal are possible.
Companies spend a fortune on data. They purchase servers and storage farms to store the data, database management systems to manage the data, transformation tools to convert and transform the data, data quality tools to fix and standardize the content, and treasure trove of analytical tools to present content that can be understood by business people. Given all of the activities, the players, and the content, why would you not want a plan?
Unfortunately, few organizations have a Data Strategy. They have lots of technology plans and roadmaps. They have platform and server plans; they have DBMS standards; they have storage strategies; they likely have analytical tool plans. While these are valuable, they are typically focused on an organization or function with minimal concern for all of the related upstream and downstream activities (how usable is a data warehouse if the data exists as multiple copies with different names and different formats, and hasn’t been checked/fixed for accuracy?) A data strategy is a plan that ensures that data is easy to find, easy to identify, easy to use, and easy to share across the company and across multiple functions.
Information technologists are exceptionally strong in the world of applications, tools, and platforms. They understand the importance of ensuring “reusability” and the benefit of an “economies-of-scale” approach. These are both just nice sound bites focused on making sure that new development work doesn’t always require reinvention. Application strategies include identifying standards (tools, platforms, storage locations, etc.) and repeatable methods to ensure efficient construction and delivery of data that can be serviced, maintained, and upgraded. An assembly line of sorts.
The challenge with most data environments is that a data strategy rarely exists; there is no repeatable methods and practices. Every new request requires building data and the associated deliverables from scratch. And, once delivered, there’s a huge testing and confirmation effort to ensure that the data is accurate. If you had a data strategy, you’d have reusable data, repeatable methods, and the details would be referenceable online instead of through tribal knowledge. And delivery efficiency and cost would improve over time.
Why do you need a data strategy? Because the cost of data is growing –and it should be shrinking. The cost of data processing has shrunk, the cost of data storage has decreased dramatically, but the cost of data delivery continues to grow. A data strategy focuses on delivering data that is easy to find, easy to use, and easy to share.
I’ve noticed lately that data warehouse vendors are dusting off the arguments and pitches of days gone by. Don’t buy specialized hardware for your database needs! You’ll never be able to re-use the gear! One rep recently told a client, “With your data warehouse on our hardware, you can re-purpose the hardware at any time!”
The truth is, while data warehouse failures were rampant a few years ago, those failures are now the exception and not the rule. Data warehouses, once installed, tend to last a while. The good ones actually add more data over time and become more entrenched among user organizations. The great ones become strategic, and business people claim not to be able to do their jobs without them. A data warehouse platform is rarely for a single use, but for a multitude of needs. Data warehouses rarely just go away.
However don’t confuse an entrenched data warehouse with an entrenched data integration solution. I’ll teach a class at The Data Warehousing Institute conferences called “Architectural Options for Data Integration.” The class covers technologies like Enterprise Application Integration (EAI); Enterprise Information Integration (EII); Extract Transformation and Loading (ETL, and its sister, ELT); and Master Data Management (MDM). I present use cases for these different solutions as well as lists of the key vendors that offer them.
Attendees I talk to admit coming to the class with the intent of justifying the data warehouse as a multi-purpose integration system. They leave the class understanding the often-stark differences of these various solutions. And I hope they return to work with a different view of their future-state integration architectures, whether they re-purpose their hardware or not.
Note: Evan’s will be teaching Beyond the Data Warehouse: Architectural Options for Data Integration at the TDWI World Conference in San Diego on Thursday, August 6.