Case Study: Certificate Automation in E-commerce Company
An e-commerce company with 200+ applications and microservices struggled with the chaos of manual certificate management. Learn the story of their transformation from problems to full automation.
Challenge: Certificate Chaos
Initial State
Company: Mid-sized e-commerce firm
- 200+ cloud applications (AWS, Azure, GCP)
- Kubernetes clusters (15 clusters, 500+ pods)
- Legacy VMs with traditional applications
- Multi-cloud architecture
Problems:
- ✗ 3-4 incidents with expired certificates per year
- ✗ No central certificate visibility
- ✗ Manual renewal processes
- ✗ 2-3 hours per month per admin on certificate management
- ✗ Different practices in different teams
- ✗ No alerts before expiration
Critical Incident
July 2023: Expired certificate for main checkout flow:
- 2 hours of downtime during peak sales
- $47,000 in lost sales
- 23% of customers abandoned carts
- Negative SEO impact
- Loss of customer trust
Root Cause: Certificate renewed but not installed on load balancer.
Solution: Three-Phase Transformation
Phase 1: Inventory (2 weeks)
Goal: Understand what we have
Actions:
- Automatic scanning
# Discovery script
#!/bin/bash
for domain in $(cat domains.txt); do
echo "Checking $domain..."
echo | openssl s_client -connect $domain:443 2>/dev/null | \
openssl x509 -noout -subject -dates -issuer >> inventory.txt
done
- Import to spreadsheet
- Domain
- Expiration date
- Issuer (CA)
- Owner/team
- Environment (prod/staging/dev)
- Platform (K8s/VM/Serverless)
Result: 237 certificates identified!
- 18 certificates expired (!)
- 42 certificates with < 30 days to expiration
- 89 different domains
- 12 different CAs used
Phase 2: Quick Wins (1 month)
Goal: Stabilization and basic monitoring
2.1 CA Consolidation
Before:
- DigiCert: $8,500/year
- Sectigo: $3,200/year
- GoDaddy: $2,100/year
- RapidSSL: $1,500/year
- Total: $15,300/year
After (Let’s Encrypt):
- Total: $0/year ✓
Savings: $15,300/year
2.2 Monitoring Setup
Implemented CrtMgr + own monitoring:
# Prometheus alerts
- alert: CertificateExpiringIn30Days
expr: (ssl_cert_not_after - time()) / 86400 < 30
annotations:
summary: "Cert expiring for {{ $labels.domain }}"
Slack integration for alerts.
2.3 Procedures and Documentation
- Renewal playbook
- Contact list
- Escalation procedures
- Runbooks
Results after 1 month:
- 0 expired certificates
- All teams receive alerts
- Standard procedures in Confluence
Phase 3: Full Automation (3 months)
Goal: Zero-touch certificate management
3.1 Kubernetes (70% of infrastructure)
Implemented cert-manager:
# Installation
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.13.0
Migration of all Ingress (script):
#!/usr/bin/env python3
import subprocess
import json
# Get all ingresses
result = subprocess.run(['kubectl', 'get', 'ingress', '-A', '-o', 'json'],
capture_output=True, text=True)
ingresses = json.loads(result.stdout)
for ing in ingresses['items']:
namespace = ing['metadata']['namespace']
name = ing['metadata']['name']
# Add cert-manager annotation
subprocess.run([
'kubectl', 'annotate', 'ingress', name, '-n', namespace,
'cert-manager.io/cluster-issuer=letsencrypt-prod',
'--overwrite'
])
print(f"✓ Migrated {namespace}/{name}")
3.2 Load Balancers (AWS/Azure/GCP)
AWS ALB with ACM:
# Automatic certificate request and attach
aws acm request-certificate \
--domain-name example.com \
--validation-method DNS \
--subject-alternative-names "*.example.com"
# Auto-renewal by AWS
3.3 Legacy VMs
Ansible playbook for automation:
# renew-certificates.yml
---
- hosts: web_servers
tasks:
- name: Install certbot
apt:
name: certbot
state: present
- name: Renew certificates
command: certbot renew --quiet
register: renewal
- name: Reload nginx if cert renewed
service:
name: nginx
state: reloaded
when: renewal.changed
- name: Setup cron job
cron:
name: "Certbot renewal"
minute: "0"
hour: "3"
job: "certbot renew --quiet && systemctl reload nginx"
Results After 6 Months
Technical Metrics
| Metric | Before | After | Change |
|---|---|---|---|
| Incidents/year | 3-4 | 0 | -100% |
| Management time/month | 40h | 2h | -95% |
| Automated certificates | 0% | 98% | +98% |
| Average renewal time | 45 min | 0 min | -100% |
| Expiration alerts | 0 | 237 | +∞ |
Business Metrics
Savings:
- Certificate costs: -$15,300/year
- Team time (40h → 2h): $6,080/month (avg $160/h * 38h)
- Total savings: $88,260/year
Avoided losses:
- 0 downtime vs 2h/year = $47,000+ saved
Quality:
- 100% SSL/TLS uptime
- 0 customer complaints about certificates
- Better SEO position (no cert errors)
Team Feedback
DevOps Team:
“We stopped worrying about certificates. Everything works automatically.”
Security Team:
“Finally we have full visibility. We know exactly what’s happening.”
Business:
“No SSL incidents is a huge change. Customers are happier.”
Lessons Learned
What Worked
- Inventory first - can’t automate what you don’t know
- Quick wins - monitoring gave immediate ROI
- Phased rollout - K8s → Cloud LB → Legacy VMs
- GitOps mindset - infrastructure as code for certs
- 24/7 monitoring - own + external (CrtMgr)
What We’d Do Differently
- Start earlier - cost of delay was high
- More staging tests - few edge cases in prod
- Better documentation - onboarding new team members
- Earlier communication - some teams surprised by changes
Final Architecture
┌─────────────────────────────────────┐
│ Applications │
│ K8s │ Cloud │ Legacy VMs │
└─────────┴─────────┴────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ Certificate Management │
│ cert-manager │ ACM │ certbot │
└─────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ Let's Encrypt │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Monitoring & Alerting │
│ Prometheus │ Grafana │ CrtMgr │
└─────────────────────────────────────┘
Timeline & Budget
Timeline: 4.5 months
- Month 1: Inventory + Quick wins
- Month 2-4: Kubernetes, Cloud, VMs automation
- Month 4.5: Testing, documentation, training
Budget: $12,000
- Tools/licenses: $2,000
- DevOps time (3 eng * 1.5 months): $10,000
- ROI: 2 months (thanks to savings)
Summary
SSL/TLS certificate automation:
- Eliminates downtime - 0 incidents in 6 months
- Saves time - 95% reduction in management time
- Saves money - $88K+ annually
- Increases security - better visibility and control
- Improves morale - teams can focus on business value
Starting a similar transformation?
- Start with inventory using CrtMgr
- Set up monitoring and alerts
- Migration plan (K8s → Cloud → Legacy)
- Test, test, test
- Deploy and monitor
Questions? Need help? Contact us - we’re happy to share our experience.
From chaos to automation - possible in 4 months.