Good Data Warehouse DBAs are Hard to Find
As a consultant I’m often asked about how roles and responsibilities should be delegated or identified within the IT organization to support the data warehousing. One role that seems to incite discussion is the role of the database administrator (DBA). Many of our clients pack a series of roles and responsibilities into a laundry list of heterogeneous tasks. However I usually recommend delineating DBA resources and assigning individuals to either transactional systems or data warehousing, an approach that tends to surprise my clients. Let me explain why I see the roles across these different systems requiring entirely different skills.
DBAs are typically focused on the care and feeding of the DBMS system to ensure that processing is consistent and performance is maintained regardless of the circumstances. The DBA is typically responsible for establishing table structures, configuring database systems, and designing queries and execution jobs to efficiently utilize the system. Users don’t like surprises; they want daily activities to complete a specific time every day. Their jobs depend on timely information. Many IT managers assume that if a DBA understands how a particular DBMS works, that DBA can address both transactional and analytical responsibilities. It’s a risky assumption.
Assuming that because someone can design and manage a transactional database environment, they are qualified to design and manage an analytical system is flawed. The details associated with designing or troubleshooting a sophisticated transactional system have little in common with an enterprise, cross-functional data warehouse.
Just because someone is a mechanic doesn’t mean they can fix or repair any type of car. The mechanic who can fix a diesel pickup is unlikely to be able to repair the engine of an 18-wheeler. While the basic skills are the same, the actual situations and experience required to solve specific problems are dramatically different. Just because someone is a DBA doesn’t mean they can design or support any type of application/database system.
The design of a transactional application is typically preceded by detailed transaction and data specifications. Because a transactional system supports specific business processes, the actual transactions, data details, and processing volumes are well understood prior to development. Most queries are single statement queries accessing individual records within a single table. It’s critical that the workload be well defined because of the enormous costs associated with these applications. Consistent response time is critical. Managing a system requires attention to transaction quantities, query plans, and data volumes to ensure that data and processing is distributed across the system’s resources. Users are often grouped by specific application (or privileges) and while processing can vary across different applications, users are usually homogeneous. System growth occurs with more users and the additional transaction volumes. While ad-hoc processing or table joins are technically feasible, they are rarely supported.
Contrast that with analytical systems. These DBAs have an entirely different set of challenges when undertaking development. Database design is often undertaken with the knowledge that the content of the database will change. It’s not uncommon for BI systems to start by supporting a single subject area only to grow exponentially in size due to the growth of additional subject areas and data volumes. The DBA designs data structures based on current and future data content needs and must also address the divergent processing needs of data loading and complex query processing.
Managing an analytical system also differs because of variety of user processing. It’s not uncommon for a data warehouse to support numerous canned reports or queries along with a category of power users generating ad-hoc queries. The challenge is preventing a single ad-hoc query from crippling the processing of the entire system. Multi-statement queries, numerous table joins, and large volumes of historical content are commonplace in analytical environments.
It becomes fairly clear that the role of a DBA is very different when comparing the work activities of analytical and operational systems. I’m not suggesting that working in one environment is more complex or difficult than the other—they’re just different. Thus the activities and their associated skills are very different. Which is why we often recommend that a single individual may be hard-pressed to support both operational and analytical environments.
Can one person address both responsibilities? Maybe. But first, try contacting your diesel mechanic and see if he’s interested in becoming your operational system DBA.
photo by Kerry 2009 via Flickr.
Repurposing Your Data Warehouse Platform—Not!
I’ve noticed lately that data warehouse vendors are dusting off the arguments and pitches of days gone by. Don’t buy specialized hardware for your database needs! You’ll never be able to re-use the gear! One rep recently told a client, “With your data warehouse on our hardware, you can re-purpose the hardware at any time!”
The truth is, while data warehouse failures were rampant a few years ago, those failures are now the exception and not the rule. Data warehouses, once installed, tend to last a while. The good ones actually add more data over time and become more entrenched among user organizations. The great ones become strategic, and business people claim not to be able to do their jobs without them. A data warehouse platform is rarely for a single use, but for a multitude of needs. Data warehouses rarely just go away.
However don’t confuse an entrenched data warehouse with an entrenched data integration solution. I’ll teach a class at The Data Warehousing Institute conferences called “Architectural Options for Data Integration.” The class covers technologies like Enterprise Application Integration (EAI); Enterprise Information Integration (EII); Extract Transformation and Loading (ETL, and its sister, ELT); and Master Data Management (MDM). I present use cases for these different solutions as well as lists of the key vendors that offer them.
Attendees I talk to admit coming to the class with the intent of justifying the data warehouse as a multi-purpose integration system. They leave the class understanding the often-stark differences of these various solutions. And I hope they return to work with a different view of their future-state integration architectures, whether they re-purpose their hardware or not.
Note: Evan’s will be teaching Beyond the Data Warehouse: Architectural Options for Data Integration at the TDWI World Conference in San Diego on Thursday, August 6.