Site Reliability Engineer

We’re hiring a Site Reliability Engineer (SRE) to help ensure the reliability and performance of the Targon platform around the clock. You’ll work at the intersection of systems engineering and DevOps to keep our infrastructure scalable, observable, and resilient. You will be focused on:

  • Ensuring our services stay online and performant, including during off hours
  • Optimizing our Kubernetes clusters, including service mesh, metrics, and logging
  • Benchmarking services and identifying bottlenecks in our current infrastructure
  • Improving observability and alerting systems to catch issues before they impact users
  • Scaling services to minimize downtime under load
  • Developing CI/CD pipelines for new and existing services

Ideal Experiences

  • Hands-on experience with Kubernetes in production environments
  • Proficiency with Golang for systems and infrastructure tooling
  • Familiarity with confidential virtual machines (CVMs)
  • Experience with Prometheus, Loki, and Grafana for monitoring and observability

Bonus Skills

  • Experience with CI/CD tools and best practices
  • Familiarity with tools like GitHub, Discord, Notion and Linear for modern team collaboration

MANIFOLD LABS

© 2025 Manifold Labs, Inc. All Rights Reserved