Good Data Warehouse DBAs are Hard to Find
As a consultant I’m often asked about how roles and responsibilities should be delegated or identified within the IT organization to support the data warehousing. One role that seems to incite discussion is the role of the database administrator (DBA). Many of our clients pack a series of roles and responsibilities into a laundry list of heterogeneous tasks. However I usually recommend delineating DBA resources and assigning individuals to either transactional systems or data warehousing, an approach that tends to surprise my clients. Let me explain why I see the roles across these different systems requiring entirely different skills.
DBAs are typically focused on the care and feeding of the DBMS system to ensure that processing is consistent and performance is maintained regardless of the circumstances. The DBA is typically responsible for establishing table structures, configuring database systems, and designing queries and execution jobs to efficiently utilize the system. Users don’t like surprises; they want daily activities to complete a specific time every day. Their jobs depend on timely information. Many IT managers assume that if a DBA understands how a particular DBMS works, that DBA can address both transactional and analytical responsibilities. It’s a risky assumption.
Assuming that because someone can design and manage a transactional database environment, they are qualified to design and manage an analytical system is flawed. The details associated with designing or troubleshooting a sophisticated transactional system have little in common with an enterprise, cross-functional data warehouse.
Just because someone is a mechanic doesn’t mean they can fix or repair any type of car. The mechanic who can fix a diesel pickup is unlikely to be able to repair the engine of an 18-wheeler. While the basic skills are the same, the actual situations and experience required to solve specific problems are dramatically different. Just because someone is a DBA doesn’t mean they can design or support any type of application/database system.
The design of a transactional application is typically preceded by detailed transaction and data specifications. Because a transactional system supports specific business processes, the actual transactions, data details, and processing volumes are well understood prior to development. Most queries are single statement queries accessing individual records within a single table. It’s critical that the workload be well defined because of the enormous costs associated with these applications. Consistent response time is critical. Managing a system requires attention to transaction quantities, query plans, and data volumes to ensure that data and processing is distributed across the system’s resources. Users are often grouped by specific application (or privileges) and while processing can vary across different applications, users are usually homogeneous. System growth occurs with more users and the additional transaction volumes. While ad-hoc processing or table joins are technically feasible, they are rarely supported.
Contrast that with analytical systems. These DBAs have an entirely different set of challenges when undertaking development. Database design is often undertaken with the knowledge that the content of the database will change. It’s not uncommon for BI systems to start by supporting a single subject area only to grow exponentially in size due to the growth of additional subject areas and data volumes. The DBA designs data structures based on current and future data content needs and must also address the divergent processing needs of data loading and complex query processing.
Managing an analytical system also differs because of variety of user processing. It’s not uncommon for a data warehouse to support numerous canned reports or queries along with a category of power users generating ad-hoc queries. The challenge is preventing a single ad-hoc query from crippling the processing of the entire system. Multi-statement queries, numerous table joins, and large volumes of historical content are commonplace in analytical environments.
It becomes fairly clear that the role of a DBA is very different when comparing the work activities of analytical and operational systems. I’m not suggesting that working in one environment is more complex or difficult than the other—they’re just different. Thus the activities and their associated skills are very different. Which is why we often recommend that a single individual may be hard-pressed to support both operational and analytical environments.
Can one person address both responsibilities? Maybe. But first, try contacting your diesel mechanic and see if he’s interested in becoming your operational system DBA.
photo by Kerry 2009 via Flickr.