Prometheus Storage

Lee Calcote

calcotestudios.com/talks

Using Remote Storage Adapters

Lee Calcote

linkedin.com/in/leecalcote

@lcalcote

gingergeek.com

lee@calcotestudios.com

clouds, containers, functions, applications and their management

@lcalcote

calcotestudios.com/talks

Show of Hands

Prometheus Architecture

source: prometheus.io

@lcalcote

Understanding and Extending

Prometheus AlertManager

KubeCon EU 2017

Prometheus Architecture

source: prometheus.io

@lcalcote

Our Environment

Random Sample Targets

                                $ git clone https://github.com/prometheus/client_golang.git
$ cd client_golang/examples/random
$ go get -d
$ go build

Fetch and compile the client library code example.

Start exampe targets in separate terminals.

                                $ ./random -listen-address=:8080
$ ./random -listen-address=:8081
$ ./random -listen-address=:8082

Create and run random sample targets.

@lcalcote

http://localhost:8080/metrics

Security

@lcalcote

http://localhost:9090

                                ./prometheus --web.enable-admin-api --web.enable-lifecycle

2.x has new flag controls access to the administrative HTTP API

look at prometheus.yml and deploy prometheus

Storage

Prometheus

Local Storage

Targets

a simple approach

Prometheus

@lcalcote

Prometheus

Time Series

a collection of data points over time, linked together into a series.
each data point is a numeric value associated with a timestamp.

                                series
  ^   
  │   . . . . . . . . . . . . . . . . .   . . . . .   request_total{path="/status",method="GET"}
  │     . . . . . . . . . . . . . . . . . . . . . .   request_total{path="/",method="POST"}
  │         . . . . . . .
  │       . . .     . . . . . . . . . . . . . . . .                  ... 
  │     . . . . . . . . . . . . . . . . .   . . . .   
  │     . . . . . . . . . .   . . . . . . . . . . .   errors_total{path="/status",method="POST"}
  │           . . .   . . . . . . . . .   . . . . .   errors_total{path="/health",method="GET"}
  │         . . . . . . . . .       . . . . .
  │       . . .     . . . . . . . . . . . . . . . .                  ... 
  │     . . . . . . . . . . . . . . . .   . . . . 
  v
    <-------------------- time --------------------->

one sample is 16 bytes = timestamp (8 bytes) + value (8 bytes)

metric name

labels

time

Container Orchestrators & Churn

                                series
  ^
  │   . . . . . .
  │   . . . . . .
  │   . . . . . .
  │               . . . . . . .
  │               . . . . . . .
  │               . . . . . . .
  │                             . . . . . .
  │                             . . . . . .
  │                                         . . . . .
  │                                         . . . . .
  │                                         . . . . .
  v
    <-------------------- time --------------------->

Designed a new storage layer that aims to make it easier to run Prometheus in environments like Kubernetes, and to prepare Prometheus for the proliferating workloads of the future.

New storage system in 2.0

is sufficiently separate from Prometheus, so as to be generally reusable.

@lcalcote

github.com/prometheus/tsdb

1.x Problems

Too many inodes
Empty inode takes 4KB
Index - hash as series id with receipt of each sample

2.x Improvements

Switched to time-sharding
Inverted Index reduces index size.
cutting down the number of series touched in queries
Using mmap

@lcalcote

                                    t0            t1             t2             t3             now
 ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐
 │           │  │           │  │           │  │           │                 ┌────────────┐
 │           │  │           │  │           │  │  mutable  │ <─── write ──── ┤ Prometheus │
 │           │  │           │  │           │  │           │                 └────────────┘
 └───────────┘  └───────────┘  └───────────┘  └───────────┘                        ^
       └──────────────┴───────┬──────┴──────────────┘                              │
                              │                                                  query
                              │                                                    │
                            merge ─────────────────────────────────────────────────┘

512MB per block

Compaction

To overhead from promql query and graph generation.

@lcalcote

Compression

1.x - based on previous sample - 16 bytes of storage

2.x - Batch compressing chunks of many samples of a series, in memory, squeezes each data point down to an average 1.37 bytes of storage.

see coreos.com/blog/prometheus-2.0-storage-layer-optimization

Backups

Blocks are immutable.

Snapshot by hard linking all the blocks.

@lcalcote

Retention

1.x not sharded on time; had to cycle through all old chunks and determine if samples were past retention time.

2.x layout also makes it trivially easy

                                                      |
 ┌────────────┐  ┌────┼─────┐  ┌───────────┐  ┌───────────┐ 
 │ 1          │  │ 2  |     │  │ 3         │  │ 4         │   . . .
 └────────────┘  └────┼─────┘  └───────────┘  └───────────┘
                      |
                      |
             retention boundary

see fabxc.org/tsdb/

Snapshots

                                curl -XPOST http://localhost:9090/api/v2/admin/tsdb/snapshot

                                ./prometheus --web.listen-address=0.0.0.0:9091 --web.enable-admin-api --storage.tsdb.path=./data/snapshots/

                                ./prometheus --web.enable-admin-api --web.enable-lifecycle

@lcalcote

Remember 2.x's new security model.

Take a snapshot.

Start a new Prometheus and point to snapshot.

backing up prometheus

http://localhost:9091

Migrating

Your Prometheus 1.8 instance should be started with the following flags and an config file containing only the external_labels setting (if any):

                                remote_read:
  - url: "http://x.x.x.x:9094/api/v1/read"

                                ./prometheus-1.8.1.linux-amd64/prometheus -web.listen-address ":9094" -config.file old.yml

Prometheus 2.0 can then be started (on the same machine):

                                ./prometheus-2.0.0.linux-amd64/prometheus --config.file prometheus.yml

Add this stanza to your prometheus.yml

@lcalcote

upgrading from 1.x

Local Storage Challenges

an ephemeral sliding window of recent data

Prometheus stores sample data on disk in a highly optimized custom format based on flat files.

efficient
tunable
NOT scalable
NOT durable

                                ./data/01BKGV7JBM69T2G1BGBGM6KB12
./data/01BKGV7JBM69T2G1BGBGM6KB12/meta.json
./data/01BKGV7JBM69T2G1BGBGM6KB12/wal
./data/01BKGV7JBM69T2G1BGBGM6KB12/wal/000002
./data/01BKGV7JBM69T2G1BGBGM6KB12/wal/000001
./data/01BKGTZQ1SYQJTR4PB43C8PD98
./data/01BKGTZQ1SYQJTR4PB43C8PD98/meta.json
./data/01BKGTZQ1SYQJTR4PB43C8PD98/index
./data/01BKGTZQ1SYQJTR4PB43C8PD98/chunks
./data/01BKGTZQ1SYQJTR4PB43C8PD98/chunks/000001
./data/01BKGTZQ1SYQJTR4PB43C8PD98/tombstones
./data/01BKGTZQ1HHWHV8FBJXW1Y3W0K
./data/01BKGTZQ1HHWHV8FBJXW1Y3W0K/meta.json
./data/01BKGTZQ1HHWHV8FBJXW1Y3W0K/wal
./data/01BKGTZQ1HHWHV8FBJXW1Y3W0K/wal/000001
./data/01BKGV7JC0RY8A6MACW02A2PJD
./data/01BKGV7JC0RY8A6MACW02A2PJD/meta.json
./data/01BKGV7JC0RY8A6MACW02A2PJD/index
./data/01BKGV7JC0RY8A6MACW02A2PJD/chunks
./data/01BKGV7JC0RY8A6MACW02A2PJD/chunks/000001
./data/01BKGV7JC0RY8A6MACW02A2PJD/tombstones

Prometheus Local Storage

@lcalcote

Remote Storage

a long-standing issue

#10

@lcalcote

github.com/prometheus/prometheus/issues/10

Native Remote Write

@lcalcote

Legacy

1.x provided write-only support for OpenTSDB, Graphite, InfluxDB

removed -storage.local.* flags

Storage Adapters

Prometheus

Remote Storage Adapter

The solution is a remote storage adapter.

3rd Party Storage

Adapter

Prometheus

custom protocol

write samples

request: matchers + time ranges

receive: series + samples

Adapter: an application that can receive batches of samples from Prometheus over HTTP and send them to some backend.

@lcalcote

prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage

@lcalcote

AppOptics: write
Chronix: write
Cortex: read and write
CrateDB: read and write
Gnocchi: write
* Graphite: write
* InfluxDB: read and write
* OpenTSDB: write
PostgreSQL/TimescaleDB: read and write
SignalFx: write

Adapters

List

Adapters

prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write

@lcalcote

Configuration

                                # The URL of the endpoint to send samples to.
url: <string>

# Timeout for requests to the remote write endpoint.
[ remote_timeout: <duration> | default = 30s ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ - <relabel_config> ... ]

# Sets the `Authorization` header on every remote write request with the
# configured username and password.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]

# Sets the `Authorization` header on every remote write request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]

# Sets the `Authorization` header on every remote write request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote write request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

AppOptics Remote Storage Adapter

prometheus2appoptics

batching
translation

POST /v1/measurements

POST /receive

1000 Samples per POST

app optics

@lcalcote

github.com/solarwinds/prometheus2appoptics

Prometheus

                                remote_write:
  - url: "http://localhost:4567/receive"

                                docker run --env ACCESS_TOKEN=1d7c3ced4caa7f4939bd16c9c1b6578ae93d7b3b597a39633a6656fe198e796d \
--env SEND_STATS=true -p 4567:4567 solarwinds/prometheus2appoptics

@lcalcote

Deploy adapter

Config remote write

deploy remote storage adapter

                                curl -X POST http://localhost:9090/-/reload

Reload config

Verify remote storage adapter

my.appoptics.com

                                curl -XPOST -g \
'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=go_memstats_alloc_bytes'

                                $ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones

@lcalcote

Delete time series 'go_memstats_allocbytes'

Delete tombstones

Verify long-term storage

my.appoptics.com

delete local storage and see long-term storage retained

Verify local storage

http://localhost:9090/graph?g0.range_input=1m&g0.expr=go_memstats_alloc_bytes&g0.tab=0

Resources

IRC: #prometheus on irc.freenode.net

Mailing lists:

prometheus-users – discussing Prometheus usage and community support
prometheus-developers – contributing to Prometheus development

@PrometheusIO

Prometheus repositories to file bugs and features requests

@lcalcote

Lee Calcote

Thank you. Questions?

clouds, containers, functions,

applications and their management

linkedin.com/in/leecalcote

@lcalcote

gingergeek.com

calcotestudios.com/talks

github.com/leecalcote

lee@calcotestudios.com

AppOptics Prometheus Storage Integration

                                http://github.com/solarwinds/prometheus2appoptics

@lcalcote

AlertManager

Prometheus

Alertmanager is an alert...

@lcalcote

Purpose

ingester

grouper

de-duplicator

silencer

throttler

notifier

Receivers

\ˈnō-mən-ˌklā-chər

a brief Prometheus AlertManager construct review

match alerts to their receiver and how often to notify

where and how to send alerts

Routes

@lcalcote

Silencers - matches alerts with specific labels and prevents them from being included in notifications.

Inhibitors - suppress specific notifications when other specific alerts are already firing.

Grouping - categorizes alerts of similar nature into a single notification.

\ˈnō-mən-ˌklā-chər

a brief Prometheus AlertManager construct review

Muting

Suppressing

Correlating

                                group_wait: 30s

group_by: ['alertname', 'cluster']

group_interval: 5m

@lcalcote

Inhibition

Multiple approaches to suppression

@lcalcote

repeat_interval

vs

Silences

vs

per route

global

via ui / api

Alerts

                                ALERT <alert name>
   IF <PromQL vector expression>
   FOR <duration>
   LABELS { ... }
   ANNOTATIONS { ... }

Supports clients other than Prometheus

is notified when alerts transition state

@lcalcote

a shared construct

Prometheus

AlertManager

inactive

firing

pending

state transition

inactive

firing

notifications

Notification Integrations

@lcalcote

Notifying to Multiple Destinations

Use continue to advance to next receiver.

                                route:
 receiver: email_webhook

receivers:
  - name: email_webhook
    email_configs:
      - to: 'lee@example.io'
    webhook_configs:
      - url: <webhook url here>

Use a receiver that goes to both destinations.

                                route:
 receiver: ops-team-all  # default
 routes:
   - match:
       severity: page
     receiver: ops-team-b
     continue: true
   - match:
       severity: critical
     receiver: ops-team-a

receivers:
  - name: ops-team-all
    email_configs:
      - to: ops-team-all@example.io
  - name: ops-team-a
    email_configs:
      - to: ops-team-a@example.io
  - name: ops-team-b
    email_configs:
      - to: ops-team-b@example.io

or

@lcalcote

api

Inhibitor

de-duplication

Dispatcher

Non-HA AlertManager Architecture

Silencer

Dispatcher sorts incoming alerts into aggregation groups and assigns the correct notifiers to each.

Alert Provider

Silence Provider

store

de-duplication

Router

batched alerts

notification pipeline

Notify Provider

checks for previously sent notifications

Retry

Maintenance Script

@lcalcote

alerts

High Availability

being introduced in 0.5

gossip protocols.

built atop Weave Mesh

With HA, you no longer have to monitor the monitor.

Designed for an alert to be sent to all instances in the cluster.

All Prometheus instances send alerts to all Alertmanager instances.

Guarantees notifications to be sent at least once.

@lcalcote