SRE (Kafka) – Kubernetes – Billion Dollar Startup – $75K – $100K
- 75K - 100K
Full Job Description
We are seeking a Site Reliability Engineer with a specialization in Kafka to join a high performing team. In this role, you will be responsible for the operation, scaling, and automation of Kafka and streaming infrastructure, ensuring top performance, availability, and service levels. Additionally, you will focus on cost-efficient operation.
- Monitor and uphold the health of Kafka clusters and streaming platforms.
- Oversee the availability of Kafka clusters and streaming platforms.
- Identify, diagnose, and resolve incidents.
- Collaborate with architects, team leads, and developers to conceptualize and implement solutions.
- Contribute to the development of tooling and automation frameworks for infrastructure provisioning and scaling.
- Share on-call responsibilities.
- Minimum of 5 years of experience as a Software Engineer, DevOps Engineer, Platform Engineer, or Site Reliability Engineer.
- Proficiency in distributed systems and streaming technologies, including Apache Kafka.
- Competence in managing services on Linux systems.
- Familiarity with monitoring solutions such as New Relic, Prometheus, Grafana.
- Experience in administering and deploying on cloud-based platforms.
- Proficiency in infrastructure as code (IaC) and configuration management tools.
- Strong written and verbal communication skills in English.
Nice to Have:
- Experience with AWS MSK.
- Previous work with Kubernetes and custom operator development.
- In-depth expertise in Kafka configuration and performance optimization.
If you are a seasoned Site Reliability Engineer with expertise in Kafka, experienced in operating and scaling Kafka and streaming infrastructure, write to us at email@example.com!