|
| 1 | +====================================================== |
| 2 | +Prometheus Metric Exemplars with OpenTelemetry Tracing |
| 3 | +====================================================== |
| 4 | + |
| 5 | +The full code for this example is available `on Github |
| 6 | +<https://github.com/GoogleCloudPlatform/opentelemetry-operations-python/tree/main/docs/examples/prometheus_exemplars>`_. |
| 7 | + |
| 8 | +This end-to-end example shows how to instrument a Flask app with with `Prometheus |
| 9 | +<https://prometheus.io/>`_ metrics linked to OpenTelemetry traces using exemplars. The example |
| 10 | +manually adds exemplars to a Prometheus Histogram which link the metric in Google Cloud managed |
| 11 | +service for Prometheus to Spans in Cloud Trace. |
| 12 | + |
| 13 | +OpenTelemetry Python is configured to send traces to the `OpenTelemetry Collector |
| 14 | +<https://opentelemetry.io/docs/collector/>`_ and the Collector scrapes the python server's |
| 15 | +Prometheus endpoint. The Collector is configured to send metrics to `Google Cloud Managed |
| 16 | +Service for Prometheus <https://cloud.google.com/stackdriver/docs/managed-prometheus>`_ and |
| 17 | +traces to `Google Cloud Trace <https://cloud.google.com/trace/docs/overview>`_. |
| 18 | + |
| 19 | +.. graphviz:: |
| 20 | + |
| 21 | + digraph { |
| 22 | + rankdir="LR" |
| 23 | + nodesep=1 |
| 24 | + |
| 25 | + subgraph cluster { |
| 26 | + server [label="Flask Application"] |
| 27 | + col [label="OpenTelemetry Collector"] |
| 28 | + } |
| 29 | + |
| 30 | + gmp [label="Google Cloud Managed Service for Prometheus" shape="box"] |
| 31 | + gct [label="Google Cloud Trace" shape="box"] |
| 32 | + |
| 33 | + server->col [label="OTLP traces"] |
| 34 | + col->server [label="Scrape Prometheus"] |
| 35 | + col->gmp [label="metrics"] |
| 36 | + col->gct [label="traces"] |
| 37 | + } |
| 38 | + |
| 39 | +To run this example you first need to: |
| 40 | + * Create a Google Cloud project. You can `create one here <https://console.cloud.google.com/projectcreate>`_. |
| 41 | + * Enable Cloud Trace API (listed in the Cloud Console as Stackdriver Trace API) in the project `here <https://console.cloud.google.com/apis/library?q=cloud%20trace&filter=visibility:public>`_. If the page says "API Enabled" then you're done! No need to do anything. |
| 42 | + * Enable Default Application Credentials by creating setting `GOOGLE_APPLICATION_CREDENTIALS <https://cloud.google.com/docs/authentication/getting-started>`_ or by `installing gcloud sdk <https://cloud.google.com/sdk/install>`_ and calling ``gcloud auth application-default login``. |
| 43 | + * Have docker and docker compose installed on your machine |
| 44 | + |
| 45 | +Attaching Prometheus Exemplars |
| 46 | +------------------------------ |
| 47 | + |
| 48 | +Prometheus exemplars can be linked to OpenTelemetry spans by setting the ``span_id`` and |
| 49 | +``trace_id`` attributes (`specification |
| 50 | +<https://github.com/open-telemetry/opentelemetry-specification/blob/v1.20.0/specification/compatibility/prometheus_and_openmetrics.md#exemplars>`_). |
| 51 | +You can get the current span using :func:`opentelemetry.trace.get_current_span`, then format |
| 52 | +its span and trace IDs to hexadecimal strings using :func:`opentelemetry.trace.format_span_id` |
| 53 | +and :func:`opentelemetry.trace.format_trace_id`. |
| 54 | + |
| 55 | +.. literalinclude:: server.py |
| 56 | + :language: python |
| 57 | + :dedent: |
| 58 | + :start-after: [START opentelemetry_prom_exemplars_attach] |
| 59 | + :end-before: [END opentelemetry_prom_exemplars_attach] |
| 60 | + |
| 61 | +Then make an observation using a `Prometheus Histogram |
| 62 | +<https://prometheus.io/docs/concepts/metric_types/#histogram>`_ as shown below. Google Cloud |
| 63 | +Monitoring can only display exemplars attached to Histograms. |
| 64 | + |
| 65 | +.. literalinclude:: server.py |
| 66 | + :language: python |
| 67 | + :dedent: |
| 68 | + :start-after: [START opentelemetry_prom_exemplars_observe] |
| 69 | + :end-before: [END opentelemetry_prom_exemplars_observe] |
| 70 | + |
| 71 | +Run |
| 72 | +--- |
| 73 | + |
| 74 | +Checkout the example code if you don't already have the repository cloned: |
| 75 | + |
| 76 | +.. code-block:: sh |
| 77 | +
|
| 78 | + git clone https://github.com/GoogleCloudPlatform/opentelemetry-operations-python.git |
| 79 | + cd docs/examples/prometheus_exemplars |
| 80 | +
|
| 81 | +First, set the environment variables needed to provide authentication to the Collector when it |
| 82 | +runs in docker. |
| 83 | + |
| 84 | +.. code-block:: sh |
| 85 | +
|
| 86 | + export USERID=$(id -u) |
| 87 | + export PROJECT_ID=<your-gcp-project> |
| 88 | + export GOOGLE_APPLICATION_CREDENTIALS="${HOME}/.config/gcloud/application_default_credentials.json" |
| 89 | +
|
| 90 | +Build and start the example containers using ``docker-compose``: |
| 91 | + |
| 92 | +.. code-block:: sh |
| 93 | +
|
| 94 | + docker-compose up --build --abort-on-container-exit |
| 95 | +
|
| 96 | +This starts three containers: |
| 97 | + |
| 98 | +#. The Flask server written in ``server.py``. It receives requests and simulates some work by |
| 99 | + waiting for a random amount of time. |
| 100 | +#. The OpenTelemetry Collector which receives traces from the Flask server by OTLP and scrapes |
| 101 | + Prometheus metrics from the Flask server's ``/metrics`` endpoint. |
| 102 | +#. A load generator that sends constant requests to the Flask server. |
| 103 | + |
| 104 | +Checking Output |
| 105 | +--------------- |
| 106 | + |
| 107 | +While running the example, you can go to `Cloud Monitoring Metrics Explorer page |
| 108 | +<https://console.cloud.google.com/monitoring/metrics-explorer>`_ to see the results. Click on |
| 109 | +the "Metric" dropdown, type ``my_prom_hist``, and select the metric from under "Prometheus |
| 110 | +Target". The full metric name is ``prometheus.googleapis.com/my_prom_hist_seconds/histogram``. |
| 111 | + |
| 112 | +.. image:: select_metric.png |
| 113 | + :alt: Select the metric |
| 114 | + |
| 115 | +After selecting the metric, you should see something like the image below, with a heatmap |
| 116 | +showing the distribution of request durations in the Python server. |
| 117 | + |
| 118 | +.. image:: heatmap.png |
| 119 | + :alt: Metrics explorer heatmap |
| 120 | + |
| 121 | +The circles on the heatmap are called "exemplars", which each link to an example span that fell |
| 122 | +within the given bucket on the heatmap. Notice that exemplars are plotted at the time when they |
| 123 | +occurred (x axis) and duration they took (y axis) in the heatmap. Clicking on an exemplar opens |
| 124 | +the Trace details flyout focused on the linked span. |
| 125 | + |
| 126 | +.. image:: trace_details.png |
| 127 | + :alt: Trace details flyout |
0 commit comments