Analytics on Big FAST Data using a Real Time Stream Data Processing Architecture
Analytics on stream data is not easy. It requires an architecture that can scale to process very fast moving, high volumes of data without failure. While a batch processing system can fail and batch jobs can wait, real-time data processing/analyzing systems cannot fail and should always be up.
Real-time systems perform analytics in a short time window, for instance correlating and predicting event streams generated over the last ten minutes. In many cases, real-time systems leverage batch processing systems such as Hadoop for better prediction capabilities. A classic use case is to use a Hadoop-based system to build a model of past event data and feed this model into a real-time system and use this model on the event stream to predict a future outcome.
In this award-winning Knowledge Sharing article, Dibyendu Bhattacharya and Manidipa Mitra describe how to build a real time analytics platform using ground-breaking open source products developed by various organizations to solve their real time use cases, including: • Kafka, a distributed, highly scalable messaging system for processing high volume of event • Storm, a distributed and fault-tolerant real-time computation and parallel stream processing platform • Esper, which can perform event correlation on the parallel event streams being processed in the Storm platform
Kafka, Storm, and Esper can be used to build a robust, fault tolerant Big FAST Data analytics platform. This Knowledge Sharing article describes how Analytical Models can be built on Hadoop (using Mahout) and can be used in real-time systems for accurate prediction.
The article presents the whole architecture by implementing a case study on predicting system faults by analyzing real-time event streams and leveraging the help provided by the model built using past fault history. This helps readers understand the architecture better and correlate it to their relevant use cases, should they need similar capabilities.