DHS: Adaptive Memory Layout Organization of Sketch Slots for Fast and Accurate Data Stream Processing
Zhao B., Li X., Tian B., Mei Z., Wu W.
Data stream processing is a crucial computation task in data mining applications. The rigid and fixed data structures in existing solutions limit their accuracy, throughput, and generality in measurement tasks. We propose Dynamic Hierarchical Sketch (DHS), a sketch-based hybrid solution targeting these properties. During the online stream processing, DHS hashes items to buckets and organizes cells in each bucket dynamically; the size of all cells in a bucket is adjusted adaptively to the actual size and distribution of flows. Thus, memory is efficiently used to precisely record elephant flows and cover more mice flows. Implementation and evaluation show that DHS achieves high accuracy, high throughput, and high generality on five measurement tasks: flow size estimation, flow size distribution estimation, heavy hitter detection, heavy changer detection, and entropy estimation.