Site Reliability Engineer | mailkube - Modern Email Infrastructure for Developers

About the Role

The Site Reliability Engineer joins mailkube’s Infrastructure & Reliability team to help us build and operate the production systems that developers and businesses depend on. You’ll work across our cloud infrastructure, observability stack, and on-call rotation to keep things running and improve them continuously.

This is a hands-on role. You’ll spend your time automating toil, improving reliability metrics, and shipping infrastructure improvements alongside the product engineering teams.

What You’ll Do

Operate and improve Kubernetes clusters across multiple cloud environments
Participate in the on-call rotation and own incident response from detection to resolution
Build and maintain automated alerting, runbooks, and postmortem processes
Reduce toil through automation using Python, Bash, and IaC tools
Monitor SLOs and work with product teams to address reliability risks before they become incidents
Improve CI/CD pipelines and deployment reliability
Collaborate with security to harden infrastructure and implement best practices

What We’re Looking For

3–5 years of infrastructure engineering or SRE experience
Solid Kubernetes experience — you’ve operated clusters in production
Comfortable with Linux administration and cloud environments (AWS preferred)
Experience with monitoring and alerting tools (Prometheus, Grafana, PagerDuty or similar)
Good scripting skills in Python or Bash
On-call culture: calm under pressure, thorough in postmortems
Working knowledge of Terraform or similar IaC tools

Nice to Have

Experience with OpenTelemetry and distributed tracing
Familiarity with email infrastructure or high-throughput messaging systems
Contributions to open-source infrastructure projects

What We Offer

Fully remote, async-first — work from anywhere
Competitive base salary + equity
Annual learning & conference budget
Home office stipend
Flexible working hours, no core hours required