Monitoring a Django Application with Prometheus

ยท 545 words ยท 3 minute read

Prometheus is a great tool for monitoring all kinds of infrastructure or apps. There are “exporters” for many kinds of services, like Node exporter for Linux servers.

When running a Django app, one may want to add custom metrics to monitor. Like number of new users, number of failed logins and other actions specific to your app. For those cases, the Prometheus Python Client may be a good fit. Despite the name, it offers all the tools needed to build your own exporter with custom metrics.

Collecting Metrics ๐Ÿ”—

First you’ll want to add a new file utils/prometheus.py to keep all code related to Prometheus in one place. This file may look like this:

from prometheus_client import (
    CollectorRegistry,
    Counter,
    Histogram,
    generate_latest,
    multiprocess,
)

registry = CollectorRegistry()
multiprocess.MultiProcessCollector(registry)

PROM_GRAPHQL_REQUEST_TIME = Histogram(
    'request_processing_seconds',
    'Time spent processing an API request',
)

PROM_PAGEVIEWS = Counter(
    'pageviews',
    'Number of pageviews',
)

def registry_to_text():
    return generate_latest(registry)

This imports two basic metric types Counter and Histogram, as well as helper classes to run a multi-processing registry. Unless you only run a single worker for your app, you’ll probably always want multi-process support.

Then to actual use those metrics, import them in your Django functions and use like this:

from utils.prometheus import PROM_PAGEVIEWS

# record a page view
PROM_PAGEVIEWS.inc()

Or to time a request: (could be added as middleware)

from utils.prometheus import PROM_REQUEST_TIME

# time a request
with PROM_REQUEST_TIME.time():
    response = get_response(request)

Find more on using available instruments in the official docs. It’s also possible to add labels.

Exposing Metrics ๐Ÿ”—

This covers collecting metrics. Now you’ll also want to expose a /metrics endpoint for collecting them. This can be done with a simple Django view:

from prometheus_client import CONTENT_TYPE_LATEST
from utils.prometheus import registry_to_text


@csrf_exempt
def prometheus_metrics(request):
    return HttpResponse(registry_to_text(), content_type=CONTENT_TYPE_LATEST)

This will dump the current registry content into a HttpResponse and set the content type to the correct type and version. It can be used like this in urls.py

path('metrics', log.views.prometheus_metrics, name='prometheus-metrics')

When calling /metrics, you should get your metrics in the usual format. The (truncated) result could look like this:

# HELP django_pageviews_total Multiprocess metric
# TYPE django_pageviews_total counter
django_pageviews_total 81.0
# HELP django_failed_logins_total Multiprocess metric
# TYPE django_failed_logins_total counter
django_failed_logins_total 2.0
# HELP graphql_request_processing_seconds Multiprocess metric
# TYPE graphql_request_processing_seconds histogram
request_processing_seconds_sum 129.74717313610017
request_processing_seconds_bucket{le="0.005"} 807.0
request_processing_seconds_bucket{le="0.01"} 1161.0
request_processing_seconds_bucket{le="0.025"} 6670.0
request_processing_seconds_bucket{le="0.05"} 6992.0
request_processing_seconds_bucket{le="0.075"} 7015.0
request_processing_seconds_bucket{le="0.1"} 7018.0
request_processing_seconds_bucket{le="0.25"} 7061.0
request_processing_seconds_bucket{le="0.5"} 7074.0
request_processing_seconds_bucket{le="+Inf"} 7075.0
request_processing_seconds_count 7075.0

Notes on Multi-Processing ๐Ÿ”—

Most apps will run multiple processes, so they need a way to share and aggregate metrics. Luckily the libary has a solution to this problem. First register the registry with the multiprocess helper as above. Then set PROMETHEUS_MULTIPROC_DIR=/tmp in your app’s env vars. This will add multiple .db files for each metric and process that are aggregated on export.

This even works with multiple difference services. Like a Gunicorn webserver and Celery works. First enable private temp folders to avoid clogging the system-wide /tmp folder:

[Service]
...
PrivateTmp=True

Then, for one service, set it to share a namespace with the other service, so they see the same /tmp folder.

[Unit]
...
JoinsNamespaceOf=gunicorn.service

[Service]
...
PrivateTmp=True

That should be all. After reloading those services, you should get aggregate metrics when calling /metrics

Finally, be sure to secure this endpoint by e.g. running it on a separate firewalled port.