Complex Event Processing: Challenging Real-Time ETL

Cave Swallow by Orin Zebest via Flickr (Creative Commons)

Unless you’ve been hiding in a cave in the past year, you’ve probably heard of CEP (Complex Event Processing) or data stream analysis. Because a lot of real-time analysis focuses on discrete data elements rather than data sets, this technology allows users to query and manipulate discrete pieces of information, like events and messages, in real-time—without being encumbered by a traditional database management system.

The analogy here is that if you can’t bring Mohammed to the mountain, bring the mountain to Mohammed: why bother loading data into a database with a bunch of other records when I only need to manipulate a single record?  Furthermore, this lets me analyze the data right after its time of creation! Since one of the biggest obstacles to query performance is disk I/O, why not bypass the I/O problem altogether?

I’m not challenging data warehousing and historical analysis. But the time has come to apply complex analytics and data manipulation against discrete records more efficiently. Some of the more common applications of this technology include fraud/transaction approval, event pattern recognition, and brokerage trading systems.

When it comes to ETL (Extract, Transform, and Load) processing, particularly in a real-time or so-called “trickle-feed” environment, CEP may actually provide a better approach to traditional ETL. CEP provides complex data manipulation directly against the individual record. There is no intermediary database. The architecture is inherently storage-efficient: if a second, third, or fourth application needs access to a particular data element, it doesn’t get its own copy. Instead, each application applies its own process. This prevents the unnecessary or reckless copying of source application content.

There are many industries need a real-time view of customer activities. For instance in the gaming industry when a customer inserts her card into a slot machine, the casino wants to provide a custom offer. Using traditional data warehouse technology, a significant amount of processing is required to capture the data, to transform and standardize it, to load it into a table, only to make it available to a query to identify the best offer.  In the world of CEP we’d simply query the initial message and make the best offer.

Many ETL tools already use query language constructs and operators to manipulate data. They typically require the data to be loaded into a database. The major vendors have evolved to an “ELT” architecture: to leverage the underlying database engine to address performance. Why not simply tackle the performance problem directly and bypass the database altogether?

The promise of CEP a new set of business applications and capabilities. I’m also starting to believe that CEP could actually replace traditional ETL tools as a higher performance and easier-to-use alternative. The interesting part will be seeing how long before companies emerge from their caves and adopt it.

photo by Orin Zebest via Flickr (Creative Commons license)

Tags: , , , , , , , , , , ,

About Evan Levy

Evan Levy is management consultant and partner at IntegralData. In addition to his day-to-day job responsibilities, Evan speaks, writes, and blogs about the challenges of managing and using data to support business decision making.

One response to “Complex Event Processing: Challenging Real-Time ETL”

  1. Mark says :

    Very interesting insights. I am in the middle of this space, and I can tell you that you definitely see where the puck is going.
    While I am not technically or esoterically proficient enough to predict the future of CEP+ETL, the bottom line is that there are numerous operational decisions in every sector and job role that depend on timely, trustworthy data. That fact necessitates what you have described at a granular, data point level. You have articulated it well.
    I don’t hold the keys to the kingdom here, nor am I speaking for any company, but where I think we should see this is going is toward a much more “natural” collaboration between man and machine, between automated analysis technology and the iterative, often-messy human decision-making process.
    Like Plato emerging from the cave launched Western culture, so too are we near a breakout with data & human analysis/decisioning. The number of decisions made involving both historical and real-time data are too numerous to mention, but in the Federal space at least there are clusters of these decisions based on specific policy requirements around which very smart people like yourself can identify and build custom rule-sets that can be updated on the fly by the user, without having to get IT involved with each change request, as discrete data elements change.
    That’s a mouthful, I know, but I’ll let it stand for now. Because inside that is a generic CEP + ETL capability that can be applied to a wide range of enterprise decisions using all available data–whether in DBs, DWs, in the cloud, in unstructured data, wherever.
    The Key is that a human analyst, not a computer, can know at an intuitive human level when certain data points or data streams are relevant. Once you understand the analytical process of an analyst and work backwards, you can see where the major need emerges: the ability to quickly update a CEP engine to monitor those new streams or data elements in an automated way. To address that need is to approach nirvana, no?
    We will see. I am excited that David Luckham is working on a new CEP project, and I expect he will show us the way forward there with lucidity. Meanwhile, we can do a lot of good with CEP & ETL even for those that live in caves.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: