Prometheus 2.0 and Massive Performance Improvements

AllCloud Blog:
Cloud Insights and Innovation

Prometheus is the tool of choice for monitoring metrics collection and alerting along with Grafana as the popular UI are what most people use for Kubernetes monitoring, but it has one major disadvantage – it does not have an easy option for scaling up or addressing increasing load.

Prometheus uses a time series based database and if there’s pressure on the storage and CPU, or if the memory system increases, the only practical solution is to increase the instance size.

The options for data sharding exist, but are not trivial to maintain.

Therefore the announcement of Prometheus 2.0 massive performance improvements should be very valuable, as you want to avoid your Prometheus sever from getting “ballooned” and even be able to shrink it.

Here is the list of the main benefits and considerations:

  1. New storage engine that typically cuts 3x-5x off RAM and CPU usage and 100x off IOPS (I/O operations)
  2. Can scale to a much higher number of time series (thanks to an inverted index of labels.)
  3. Metric recording and Alert rules are now maintained in an hierarchical manner and allow more accurate aggregations and alerting, avoiding race conditions.
    • That’s also good if you want to down sample your data or stream it.
  4. The new Prometheus Database structure is different than the previous one, hence it requires a conversion in case you want to preserve old metric data and import it to your new Prometheus 2.0 server.
    • You can setup a new Prometheus 2.0 server in parallel to your existing 1.x server and test it.
    • If you don’t need the old data, simply use the 2.0 server.
    • If you continuously ship your Prometheus data into Grafana with its InfluxDB you can more easily disregard the old Prometheus data as valuable.
    • You can simply wait until your old data expires per the setting of the storage.local.retention flag (default is 15 days).
  5. Storage snapshots are now available, easing online backups of The Prometheus Time Series Database, without service interruption.

Have you tried Prometheus 2.0 yet? Let us know what you think!

Jack Bezalel

Read more posts by Jack Bezalel