Flood: Elastic Streaming MapReduce



Distributed data stream processing (DSP) is used to analyze information and raise alarms in business-critical scenarios such as financial fraud-detection, clickstream processing, network security, traffic control, or real-time KPI computations. Processing this information efficiently is very challenging because the nature of continuous streaming sources is varying in nature: often the amount of data and processing changes with time of day and day of week and frequently has unexpected spikes. Thus, the result is that most DSP computations are either over-provisioned, introducing increased cost and wasted energy, or are under- provisioned and, either incur in performance degradation or denial-of-service, or have to resort to load shedding.
We demonstrate Flood, a scalable, elastic DSP engine that solves these problems. By using a scalable computing model, MapReduce, and adequately monitoring running computations our system is able to decide, in runtime, if there is a lack or a waste of resources. Flood then acts autonomically by requesting or releasing computing nodes, immediately expanding or contracting the computation, making sure that latency and throughput requirements are guaranteed. This leads to augmented efficiency and lowered costs, all while insuring quality of service.


Distributed Stream Processing, Elastic, Scalability


Distributed Systems


ACM DEBS, July 2010

Cited by

No citations found