Principal Site Reliability Engineer | mailkube - Modern Email Infrastructure for Developers

About the Role

The Principal Site Reliability Engineer owns the reliability posture of mailkube’s entire production platform. This is a senior individual-contributor role with broad organisational influence — you’ll define how we build, operate, and scale infrastructure across the company.

You’ll work directly with engineering leadership to set the technical direction for observability, incident management, and capacity planning. You’re the person who sees the system as a whole and keeps it healthy as we grow.

What You’ll Do

Own the SLO/SLI framework across all production services and drive adoption with product engineering teams
Design and operate multi-region, high-availability infrastructure for email delivery at scale
Lead the incident response programme: on-call rotations, blameless postmortems, runbook culture
Drive infrastructure-as-code practices with Terraform across AWS and GCP
Define the observability stack — metrics, logs, traces — and ensure every service is instrumented correctly
Mentor and grow a team of SREs; set technical standards and review architecture decisions
Partner with security on hardening, compliance, and threat modelling

What We’re Looking For

8+ years of infrastructure or SRE experience, with at least 3 in a senior or staff-level role
Deep expertise with Kubernetes in production (multi-cluster, multi-region)
Strong IaC background — Terraform required, Pulumi a bonus
Hands-on experience with a major cloud provider (AWS preferred, GCP experience valued)
Solid observability experience: Prometheus, Grafana, OpenTelemetry, distributed tracing
Strong scripting and automation skills in Python and/or Go
Proven incident response leadership — you’ve owned major incidents end-to-end
Network fundamentals: DNS, TLS, BGP, CDN/edge routing

Nice to Have

Experience with email infrastructure (SMTP, deliverability, SPF/DKIM/DMARC)
Familiarity with eBPF-based networking or observability tooling
Security certifications or formal training in threat modelling

What We Offer

Fully remote, async-first — work from anywhere
Competitive base salary + equity
Annual learning & conference budget
Home office stipend
Flexible working hours, no core hours required