Problem

Recently, I’ve faced an interesting challenge around labels and Prometheus Recording Rules. I was writing a new Recording Rules and felt that labels available in the metrics composing recording rules were insufficient for my purposes of simplifying SLI, SLO and Error budget management with Prometheus. I’ve decided to find a way to add arbitrary labels to the prometheus queries and recording rules. I could not find any resource answering my question directly, so I hope my findings will be useful to others.

Imagine that you need an availability target for your service. One way to do it is to define it manually in the query, Grafana dashboard, in the alert. Then, when the team decided to alter availability target, you’ll need to go and find all these references and update them. Instead, we can use a static Recording Rule so availability target can be just set in one place.

For example:

- record: job:availability:999
  expr: |
    99.9

Above will create a Recording Rule job:availability:999 which will have no labels and will always yield 0.001. So, when you’ll decide to cover a multitude of availabilities, you’ll end-up having job:availability:99, job:availability:95 and similar as well as to change between them when availability target changes.

Instead, I wanted to create a recording rule that hides different values behind labels. Imagine if you could query job:availability:value{availability='99.9'} instead, therefore allowing teams to select availability via labels, not via different Recording Rules.

Solution

label_replace is a built-in function that will save our day. From the documentation it is clear that function is intended to be used to replace some existing labels with the new values which are derivative of the existing labels. However, what if we try and game the system here. Instead of label_replace(up{job="api-server",service="a:c"}, "foo", "$1", "service", "(.*):.*") where last two parameters are source label (label on the existing time series) and regex string (regex used to extract a value from the source label) we can ignore source labels altogether by setting those to "" and simply create an arbitrary label from our string input label_replace(up{job="api-server",service="a:c"}, "arbitrary_key", "arbitrary_value", "", "". It worked!

Let’s rework our previous example:

# Taking earlier Recording Rules
- record: job:availability:999
  expr: |
    99.9
- record: job:availability:99
  expr: |
    99

# After
- record: availability:availability:value
  expr: |
    label_replace(job:error_budget:999, "availability", "99.9", "","")
- record: availability:availability:value
  expr: |
    label_replace(job:error_budget:99, "availability", "99", "","")

Above Recording Rules have given me ability to select various availability thresholds using labels.

availability:availability:value
# Above query returns following time series
availability:availability:value{availability="99"} 99
availability:availability:value{availability="99.9"} 99.9

Bonus challenge

I would like to add more than one label to my prometheus query. Easy, but not pretty. All you need to do is to nest label_replace functions.

For example:

- record: job:availability:value
  expr: |
    label_replace(label_replace(job:error_budget:999, "error_budget", "0.01", "",""), "availability", "99.9", "","")

Warning

It is generally recommended to keep labels to the justified minimum, do not go overboard with labelling resulting in high-cardinality timeseries. It will cause more harm then good.