Tag Archive | data warehousing

Who Has My Personal Data?

20131129WhoHasMyData

In order to prepare for the cooking gauntlet that often occurs with the end of year holiday season, I decided to purchase a new rotisserie oven.  The folks at Acme Rotisserie include a large amount of documentation with their rotisserie. I reviewed the entire pile and was a bit surprised by the warranty registration card. The initial few questions made sense: serial number, place of purchase, date of purchase, my home address.  The other questions struck me as a bit too inquisitive: number of household occupants, household income, own/rent my residence, marital status, and education level. Obviously, this card was a Trojan horse of sorts; provide registration details –and all kinds of other personal information.  They wanted me to give away my personal information so they could analyze it, sell it, and make money off of it.

Companies collecting and analyzing consumer data isn’t anything new –it’s been going on for decades.  In fact, there are laws in place to protect consumer’s data in quite a few industries (healthcare, telecommunications, and financial services). Most of the laws focus on protecting the information that companies collect based on their relationship with you.  It’s not the just details that you provide to them directly; it’s the information that they gather about how you behave and what you purchase.  Most folks believe behavioral information is more valuable than the personal descriptive information you provide.  The reason is simple: you can offer creative (and highly inaccurate) details about your income, your education level, and the car you drive.  You can’t really lie about your behavior.

I’m a big fan of sharing my information if it can save me time, save me money, or generate some sort of benefit. I’m willing to share my waist size, shirt size, and color preferences with my personal shopper because I know they’ll contact me when suits or other clothing that I like is available at a good price.  I’m fine with a grocer tracking my purchases because they’ll offer me personalized coupons for those products.  I’m not okay with the grocer selling that information to my health insurer.  Providing my information to a company to enhance our relationship is fine; providing my information to a company so they can share, sell, or otherwise unilaterally benefit from it is not fine.  My data is proprietary and my intellectual property.

Clearly companies view consumer data to be a highly valuable asset.  Unfortunately, we’ve created a situation where there’s little or no cost to retain, use, or abuse that information. As abuse and problems have occurred within certain industries (financial services, healthcare, and others), we’ve created legislation to force companies to responsibly invest in the management and protection of that information. They have to contact you to let you know they have your information and allow you to update communications and marketing options. It’s too bad that every company with your personal information isn’t required to behave in the same way.  If data is so valuable that a company retains it, requiring some level of maintenance (and responsibility) shouldn’t be a big deal.

It’s really too bad that companies with copies of my personal information aren’t required to contact me to update and confirm the accuracy of all of my personal details. That would ensure that all of the specialized big data analytics that are being used to improve my purchase experiences were accurate. If I knew who had my data, I could make sure that my preferences were up to date and that the data was actually accurate.

It’s unfortunate that Acme Rotisserie isn’t required to contact me to confirm that I have 14 children, an advanced degree in swimming pool construction, and that I have Red Ferrari in my garage. It will certainly be interesting to see the personalized offers I receive for the upcoming Christmas shopping season.

Advertisements

My Dog Ate the Requirements

20131016DogAteMyHomework

I received a funny email the other day about excuses that school children use to explain why they haven’t done their homework.  The examples were pretty creative:  “my mother took it to be framed”, “I got soap in my eyes and was blinded all night”, and (an oldie and a goody) –“my dog ate my homework”.  It’s a shame that such a creative approach yielded such a high rate of failure. Most of us learn at an early age that you can’t talk your way out of failure; success requires that you do the work.  You’d also think that as people got older and more evolved, they’d realize that there’s very few shortcuts in life.

I’m frequently asked to conduct best practice reviews of business intelligence and data warehouse (BI/DW) projects. These activities usually come about because either users or IT management is concerned with development productivity or delivery quality. The review activity is pretty straight forward; interviews are scheduled and artifacts are analyzed to review the various phases, from requirements through construction to deployment. It’s always interesting to look at how different organizations handle architecture, code design, development, and testing.  One of the keys to conducting a review effort is to focus on the actual results (or artifacts) that are generated during each stage. It’s foolish to discuss someone’s development method or style prior to reviewing the completeness of the artifacts. It’s not necessary to challenge someone approach if their artifacts reflect the details required for the other phases.

And one of the most common problems that I’ve seen with BI/DW development is the lack of documented requirements. Zip – zero –zilch – nothing.  While discussions about requirements gathering, interview styles, and even document details occur occasionally, it’s the lack of any documented requirements that’s the norm.   I can’t imagine how any company allows development to begin without ensuring that requirements are documented and approved by the stakeholders.  Believe it or not, it happens a lot.

So, as a tribute to the creative school children of yesterday and today, I thought I would devote this blog to some of the most creative excuses I’ve heard from development teams to justify their beginning work without having requirements documentation.

  •  “The project’s schedule was published. We have to deliver something with or without requirements”
  • “We use the agile methodology, it’s doesn’t require written requirements”
  • “The users don’t know what they want.”
  • “The users are always too busy to meet with us”
  • “My bonus is based on the number of new reports I create.  We don’t measure our code against requirements”
  • “We know what the users want, we just haven’t written it down”
  • “We’ll document the requirements once our code is complete and testing finished”
  • “We can spend our time writing requirements, or we can spend our time coding”
  • “It’s not our responsibility to document requirements; the users need to handle that”
  • “I’ve been told not to communicate with the business users”

Many of the above items clearly reflect a broken set of management or communication methods. Expecting a development team to adhere to a project schedule when they don’t have requirements is ridiculous.  Forcing a team to commit to deliverables without requirements challenges conventional development methods and financial common sense. It also reflects leadership that focuses on schedules, utilization and not business value.

A development team that is asked to build software without a set of requirements is being set up to fail. I’m always astonished that anyone would think they can argue and justify that the lack of documented requirements is acceptable.  I guess there are still some folks that believe they can talk their way out of failure.

 

 

Project Success = Data Usability

One of the challenges in delivering successful data-centric projects (e.g. analytics, BI, or reporting) is realizing that the definition of project success differs from traditional IT application projects.  Success for a traditional application (or operational) project is often described in terms of transaction volumes, functional capabilities, processing conformance, and response time; data project success is often described in terms of business process analysis, decision enablement, or business situation measurement.  To a business user, the success of a data-centric project is simple: data usability.

It seems that most folks respond to data usability issues by gravitating towards a discussion about data accuracy or data quality; I actually think the more appropriate discussion is data knowledge.  I don’t think anyone would argue that to make data-enabled decisions, you need to have knowledge about the underlying data.  The challenge is understanding what level of knowledge is necessary.  If you ask a BI or Data Warehouse person, their answer almost always includes metadata, data lineage, and a data dictionary.  If you ask a data mining person, they often just want specific attributes and their descriptions — they don’t care about anything else.  All of these folks have different views of data usability and varying levels (and needs) for data knowledge.

One way to improve data usability is to target and differentiate the user audience based on their data knowledge needs.  There are certainly lots of different approaches to categorizing users; in fact, every analyst firm and vendor has their own model to describe different audience segments.  One of the problems with these types of models is that they tend to focus heavily on the tools or analytical methods (canned reports, drill down, etc.) and ignore the details of data content and complexity. The knowledge required to manipulate a single subject area (revenue or customer or usage) is significantly less than the skills required to manipulate data across 3 subject areas (revenue, customer, and usage).  And what exacerbates data knowledge growth is the inevitable plethora of value gaps, inaccuracies, and inconsistencies associated with the data. Data knowledge isn’t just limited to understanding the data; it includes understanding how to work around all of the imperfections.

Here’s a model that categories and describes business users based on their views of data usability and their data knowledge needs

Level 1: “Can you explain these numbers to me?”

This person is the casual data user. They have access to a zillion reports that have been identified by their predecessors and they focus their effort on acting on the numbers they get. They’re not a data analyst – their focus is to understand the meaning of the details so they can do their job. They assume that the data has been checked, rechecked, and vetted by lots of folks in advance of their receiving the content. They believe the numbers and they act on what they see.

Level 2: “Give me the details”

This person has been using canned reports, understands all the basic details, and has graduated to using data to answer new questions that weren’t identified by their predecessors.  They need detailed data and they want to reorganize the details to suit their specific needs (“I don’t want weekly revenue breakdowns – I want to compare weekday revenue to weekend revenue”).  They realize the data is imperfect (and in most instances, they’ll live with it).  They want the detail.

Level 3: “I don’t believe the data — please fix it”

These folks know their area of the business inside/out and they know the data. They scour and review the details to diagnose the business problems they’re analyzing.  And when they find a data mistake or inaccuracy, they aren’t shy about raising their hand. Whether they’re a data analyst that uses SQL or a statistician with their favorite advanced analytics algorithms, they focus on identifying business anomalies.  These folks are the power users that are incredibly valuable and often the most difficult for IT to please.

Level 4: “Give me more data”

This is subject area graduation.  At this point, the user has become self-sufficient with their data and needs more content to address a new or more complex set of business analysis needs. Asking for more data – whether a new source or more detail – indicates that the person has exhausted their options in using the data they have available.  When someone has the capacity to learn a new subject area or take on more detailed content, they’re illustrating a higher level of data knowledge.

One thing to consider about the above model is that a user will have varying data knowledge based on the individual subject area.  A marketing person may be completely self-sufficient on revenue data but be a newbie with usage details.  A customer support person may be an expert on customer data but only have limited knowledge of product data.  You wouldn’t expect many folks (outside of IT) to be experts on all of the existing data subject areas. Their knowledge is going to reflect the breadth of their job responsibilities.

As someone grows and evolves in business expertise and influence, it’s only natural that their business information needs would grow and evolve too.  In order to address data usability (and project success), maybe it makes sense to reconsider the various user audience categories and how they are defined.  Growing data knowledge isn’t about making everyone data gurus; it’s about enabling staff members to become self-sufficient in their use of corporate data to do their jobs.

Photo “Ladder of Knowledge” courtesy of degreezero2000 via Flickr (Creative Commons license).

Repurposing Your Data Warehouse Platform—Not!

Wayne's World -- Not!

I’ve noticed lately that data warehouse vendors are dusting off the arguments and pitches of days gone by. Don’t buy specialized hardware for your database needs! You’ll never be able to re-use the gear! One rep recently told a client, “With your data warehouse on our hardware, you can re-purpose the hardware at any time!”

The truth is, while data warehouse failures were rampant a few years ago, those failures are now the exception and not the rule. Data warehouses, once installed, tend to last a while. The good ones actually add more data over time and become more entrenched among user organizations. The great ones become strategic, and business people claim not to be able to do their jobs without them. A data warehouse platform is rarely for a single use, but for a multitude of needs. Data warehouses rarely just go away.

However don’t confuse an entrenched data warehouse with an entrenched data integration solution. I’ll teach a class at The Data Warehousing Institute conferences called “Architectural Options for Data Integration.” The class covers technologies like Enterprise Application Integration (EAI); Enterprise Information Integration (EII); Extract Transformation and Loading (ETL, and its sister, ELT); and Master Data Management (MDM). I present use cases for these different solutions as well as lists of the key vendors that offer them.

Attendees I talk to admit coming to the class with the intent of justifying the data warehouse as a multi-purpose integration system. They leave the class understanding the often-stark differences of these various solutions. And I hope they return to work with a different view of their future-state integration architectures, whether they re-purpose their hardware or not.

Note: Evan’s will be teaching Beyond the Data Warehouse: Architectural Options for Data Integration at the TDWI World Conference in San Diego on Thursday, August 6.

Why BI Development is Different

By Evan Levy

When companies initially embark on their BI development initiatives, they often underestimate its complexity. Some begin BI in the first place because their packaged applications don’t deliver the reporting functionality they need. Others embark on BI because the data they need to analyze is located in multiple, disparate application systems. While positioning a data warehouse to integrate and store historical data from packaged applications, like ERP or CRM, is a reasonable and proven approach, many companies try to repurpose the development methods associated with these packages to deliver BI.

But comparing development methods and skill sets for these two divergent types of systems is like comparing picking apples to making a fruit salad. The fact is the methodology for building a data warehouse is very similar to traditional code development using lower-level programming languages. To be successful building a data warehouse, a team should have skills in business requirements gathering, functional requirements definition, specification and design, data modeling, database design, as well as all the skills associated with loading the data and coding the application. This is clearly a complex mix of technical knowledge to deliver a business solution spanning everything from storage allocation to workload management to systems integration to application programming. The fact is you’re building something from scratch.

The packaged application world is complex in its own right, but it’s also very different, as are the skills and methodologies involved in building these environments. Most IT organizations accustomed to implementing packages use third-party firms to install and configure these systems. Their staff members don’t have the necessary skills to build these solutions, and often require training and multiple years of hands-on use to be proficient in supporting these systems. In addition, most organizations forget that implementing their business applications typically takes a year or longer.

When was the last time you were allowed a full year to implement your data warehouse? And was your team even half the size of the packaged app’s development team?

Welcome Inside!

Welcome to Baseline’s blog
entries, and to my inaugural blog, Inside
IT
. For those of you who have seen me present and read some of my articles,
you’ll be happy (or sad) to know that this blog will echo the same themes,
tone, and yes, sense of humor, from those other media. (I promise to control my
colorful language and not use too many four-letter words, unless it’s something
like “SDLC” or “BPEL.”)

My Baseline blog will be
consistent with the rest of my speaking and writing topics, which means that it
will align with some of the core assumptions in my other content, including:

  •  We’re doing all
    this IT stuff to help the business. We’ve obsessed over the importance of IT
    having a place at the corporate table, but we sometimes forget we’re here to
    support business actions and decision making. Companies use technology and data
    to help run their businesses, not because they want to win awards for the
    biggest database. We’re so wrapped up in protecting the reputation of IT that
    sometimes we forget about the business. As Jill would say, we do so at our
    peril.
  • Too many IT
    organizations forget that data can contribute to innovation. If you take a look
    at what a retailer does, it doesn’t invent its own POS or inventory management
    systems, it buys them. What’s valuable is the data. Where IT provides value
    isn’t in deploying its backbone systems, but creating the decision making systems
    supported by information. Which as it happens are closer to the business users.
    Notice a theme here?
  • Data integration
    isn’t rocket science. It’s really not that hard. The complexity isn’t in the
    processing. It’s in defining the rules for identification and integration. We
    still find IT shops that want to build their own ETL tools rather than
    designing the right data integration frameworks. Sometimes the rules that
    govern integration aren’t as sexy as building new software. Sometimes we don’t
    need to build a better mousetrap ‘cuz there are no mice. We have other problems
    to solve.

 The whole premise here, and
maybe my new mantra, is: Leverage, re-use, and buy if you have to. Check back
here often and we’ll discuss how to do them.

%d bloggers like this: