Probabilistic data structures for Kafka Streams
Apache Kafka has emerged as the world’s most popular data streaming platform backbone. Only recently, a new flavour of a stream processing has been introduced – Kafka Streams. However, while simple and lightweight, it lacks a high-level API for probabilistic data structures, which could be a way to provide a considerable solution for the bottlenecks.
In this talk, we will briefly describe the foundational concepts behind both Kafka Streams and probabilistic data structures and present our efforts in creating a library, which connects powerfully Kafka Streams pipeline builders to a variety of modern big data tools for approximate result computation.