Monitoring is crucial for normal operation of any system and it might become extremely challenging when expected to cover hundreds of different services across thousands of machines. Accepting all the challenges CERN IT Monitoring infrastructure has been designed to serve metrics and logs from heterogeneous data sources at scale.
Based on widely known technology stack (e.g. Kafka, Spark, InfluxDB and Elasticsearch) the system provides opportunity for monitoring all the IT infrastructure, services and world’s largest computing grid WLCG at one single place. This talk is about the journey of building the IT Monitoring Service at CERN. Nikolay will speak about the architecture challenges and reveal design decisions that make it such a unified service.