Infrastructure & Reliability
Principal Site Reliability Engineer
SREPAbout the Role
The Principal Site Reliability Engineer owns the reliability posture of Mail Tactic’s entire production platform. This is a senior individual-contributor role with broad organisational influence — you’ll define how we build, operate, and scale infrastructure across the company.
You’ll work directly with engineering leadership to set the technical direction for observability, incident management, and capacity planning. You’re the person who sees the system as a whole and keeps it healthy as we grow.
What You’ll Do
- Own the SLO/SLI framework across all production services and drive adoption with product engineering teams
- Design and operate multi-region, high-availability infrastructure for email delivery at scale
- Lead the incident response programme: on-call rotations, blameless postmortems, runbook culture
- Drive infrastructure-as-code practices with Terraform across AWS and GCP
- Define the observability stack — metrics, logs, traces — and ensure every service is instrumented correctly
- Mentor and grow a team of SREs; set technical standards and review architecture decisions
- Partner with security on hardening, compliance, and threat modelling
What We’re Looking For
- 8+ years of infrastructure or SRE experience, with at least 3 in a senior or staff-level role
- Deep expertise with Kubernetes in production (multi-cluster, multi-region)
- Strong IaC background — Terraform required, Pulumi a bonus
- Hands-on experience with a major cloud provider (AWS preferred, GCP experience valued)
- Solid observability experience: Prometheus, Grafana, OpenTelemetry, distributed tracing
- Strong scripting and automation skills in Python and/or Go
- Proven incident response leadership — you’ve owned major incidents end-to-end
- Network fundamentals: DNS, TLS, BGP, CDN/edge routing
Nice to Have
- Experience with email infrastructure (SMTP, deliverability, SPF/DKIM/DMARC)
- Familiarity with eBPF-based networking or observability tooling
- Security certifications or formal training in threat modelling
What We Offer
- Fully remote, async-first — work from anywhere
- Competitive base salary + equity
- Annual learning & conference budget
- Home office stipend
- Flexible working hours, no core hours required
Department
Infrastructure & Reliability
Location
Full remote
Employment
Full-time
Experience
8+ years
Skills
Ready to apply?
Send us your application via our contact page. Include your LinkedIn or a link to relevant work.