Session Abstract: In many ways, Big Data is what clouds were made for. Computing problems that are beyond the grasp of a single computer—no matter how huge—are easy for elastic platforms to handle. In this session, big data processing pioneer Colin Clark will discuss how to discover hidden signals and new knowledge within in huge streams of realtime data, applying event processing design patterns to events in real time.
Speaker – Colin Clark, CTO, Cloud Event Processing
Colin opens talking about high velocity, big data. Then, gives his Complex Event Processing Criteria:
- Domain Specific Language
- Continuous Query
- Time/Length Windows
- Pattern Matching
Example of what Colin is talking about: “Select * from everything where itsInteresting = toMe in last 10 minutes”
How much data does that return? How much processing will it take?
Limitations of current CEP solutions: memory bound, compute bound and black box. Using CEP, can analyze data in-flight, but have limitations. Other challenge is time series analysis.
A technique available for time series analysis is symbolic aggregate approximation (SAX).
Colin is describing the construction of a “SAX word” from a days worth of IBM trading. Then, search history for that same word, to find a pattern.
Getting closer to solving the high velocity, big data problem. But, still too much data to process. So, the next element in cloud event processing is Map/Reduce.
Still though, need to address the real-time (event-driven) aspect. Brings us to virtualized resources (cloud).
So, assuming I captured this correctly: High velocity, big data = CEP + SAX + Streaming Map/Reduce + virtualized resources, which equals Cloud Event Processing’s Darkstar.
Today, Darkstar is working on Wall Street, doing market surveillance at the exchange. Speaking with Colin in the hallway, we discussed non-capital market prospects as well.


{ 1 trackback }
{ 0 comments… add one now }