SSL Certificate Monitoring with Prometheus and Grafana

Published on December 29, 2024 by CrtMgr Team • 5 min read

SSL Monitoring Prometheus Grafana DevOps Observability

Ask any engineer who’s dealt with an expired certificate in production and they’ll tell you the same thing: the alert that matters is the one that fires before everything breaks, not the PagerDuty that wakes you up at 3 AM. SSL certificate expiration is completely predictable — you know the exact moment it expires from day one — yet it remains one of the most common causes of avoidable outages.

If you’re already running Prometheus and Grafana for infrastructure observability, adding certificate monitoring is a natural extension. This guide walks through the full setup: exporters, alert rules, Grafana dashboards, and Alertmanager routing, so certificates become just another metric in your observability stack.

For the Kubernetes side of certificate management, cert-manager automates the entire issuance and renewal workflow and plays nicely with Prometheus metrics — which we’ll cover later in this guide.

Monitoring Architecture

[Endpoints] → [Exporters] → [Prometheus] → [Grafana]
                                ↓
                           [Alertmanager]

SSL/TLS Exporters for Prometheus

1. ssl_exporter

2. blackbox_exporter

Universal exporter that can also check SSL.

Configuration:

# blackbox.yml
modules:
  ssl_expiry:
    prober: tcp
    timeout: 5s
    tcp:
      tls: true
      tls_config:
        insecure_skip_verify: false

Prometheus config:

scrape_configs:
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
      module: [ssl_expiry]
    static_configs:
      - targets:
        - https://example.com
        - https://api.example.com
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

Key Metrics

ssl_exporter Metrics

# Expiration date (timestamp)
ssl_cert_not_after

# Issue date
ssl_cert_not_before

# Certificate information
ssl_cert_subject_common_name
ssl_cert_issuer_common_name

# Verification status
ssl_tls_connect_success
ssl_prober_success

# TLS version
ssl_tls_version_info

Calculating days until expiration

# Days to expiration
(ssl_cert_not_after - time()) / 86400

# Hours to expiration  
(ssl_cert_not_after - time()) / 3600

Prometheus Alerts

Alert Rules

# prometheus-rules.yml
groups:
- name: ssl_certificates
  rules:
  
  # Alert 30 days before expiration
  - alert: SSLCertExpiringSoon
    expr: (ssl_cert_not_after - time()) / 86400 < 30
    for: 24h
    labels:
      severity: warning
    annotations:
      summary: "SSL certificate expiring soon for {{ $labels.instance }}"
      description: "Certificate expires in {{ $value }} days"
  
  # Alert 7 days before expiration
  - alert: SSLCertExpiringCritical
    expr: (ssl_cert_not_after - time()) / 86400 < 7
    for: 1h
    labels:
      severity: critical
    annotations:
      summary: "SSL certificate expiring CRITICAL for {{ $labels.instance }}"
      description: "Certificate expires in {{ $value }} days!"
  
  # Certificate expired
  - alert: SSLCertExpired
    expr: ssl_cert_not_after - time() < 0
    labels:
      severity: critical
    annotations:
      summary: "SSL certificate EXPIRED for {{ $labels.instance }}"
      description: "Certificate has expired!"
  
  # Probe failed
  - alert: SSLProbeFailed
    expr: ssl_prober_success == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "SSL probe failed for {{ $labels.instance }}"
      description: "Cannot verify SSL certificate"

Alertmanager Configuration

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'cluster']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'team-ssl'

receivers:
- name: 'team-ssl'
  email_configs:
  - to: 'ssl-team@example.com'
    from: 'alertmanager@example.com'
    smarthost: 'smtp.example.com:587'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    channel: '#ssl-alerts'
    title: 'SSL Certificate Alert'

Grafana Dashboards

Import Dashboard

Grafana → Dashboards → Import
ID: 14662 (SSL/TLS Certificate Dashboard)
Select Prometheus datasource
Import

Custom Panels

Panel 1: Certificate Expiry Table

sort_desc((ssl_cert_not_after - time()) / 86400)

Panel 2: Certificate Age

(time() - ssl_cert_not_before) / 86400

Panel 3: TLS Version Distribution

count by (ssl_tls_version) (ssl_tls_version_info)

Docker Compose Stack

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./rules.yml:/etc/prometheus/rules.yml
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
  
  ssl_exporter:
    image: ribbybibby/ssl-exporter:latest
    ports:
      - "9219:9219"
  
  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"
    depends_on:
      - prometheus
  
  alertmanager:
    image: prom/alertmanager:latest
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
    ports:
      - "9093:9093"

volumes:
  prometheus-data:
  grafana-data:

Best Practices

Wondering what this looks like at scale? Our case study on certificate automation shows how a team managing 200+ certificates across three cloud providers built exactly this kind of monitoring stack — and what happened before they did.

1. Scrape Frequency

# Not too often - certificates don't change every minute
scrape_interval: 5m  # 5 minutes is enough
scrape_timeout: 10s

2. Alert Grouping

# Group alerts by domain
route:
  group_by: ['instance', 'alertname']
  group_wait: 30s
  group_interval: 5m

3. Retention

# Prometheus
--storage.tsdb.retention.time=90d  # 3 months of history

Integrations

Slack

slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK'
  channel: '#ssl-alerts'
  title: '{{ .GroupLabels.alertname }}'
  text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

PagerDuty

pagerduty_configs:
- service_key: 'YOUR_KEY'
  description: '{{ .GroupLabels.alertname }}'

Troubleshooting

Exporter not working

# Check logs
journalctl -u ssl_exporter -f

# Test manually
curl http://localhost:9219/probe?target=example.com:443

No metrics in Prometheus

# Check targets
curl http://localhost:9090/api/v1/targets

# Query metrics
curl 'http://localhost:9090/api/v1/query?query=ssl_cert_not_after'

Integrating SSL/TLS monitoring with Prometheus and Grafana gives you visibility (all certificates in one place), proactivity (alerts before anything expires), history (track changes and renewals over time), and easy CI/CD integration. It’s the foundation every mature infrastructure needs.

One more thing to keep in mind: certificate validity periods are getting shorter. The industry is moving toward 47-day certificates over the next few years — which means monitoring and automation become even more critical, not less. Get your stack set up now, while you have the breathing room.

Combine this with tools like CrtMgr for additional external monitoring and you have a complete observability stack for certificates. Internal metrics tell you what your infrastructure sees; external monitoring tells you what your users see.

You can’t improve what you don’t measure — and you can’t sleep well without monitoring your certificates.