Company
Mistral AILocation
LondonCompany Size
201–500 employeesSalary
Competitive salary dependent on experienceAbout the job
Mistral AI is hiring experienced Site Reliability Engineers to shape the reliability, scalability, and performance of its platform and customer-facing applications. This role involves balancing day-to-day operations with long-term engineering improvements to minimize operational toil while ensuring system reliability and availability. Key responsibilities include designing and maintaining scalable, fault-tolerant infrastructures, ensuring high availability of inference and training environments across HPC clusters, troubleshooting production systems, and enhancing monitoring, alerting, and incident response capabilities. Engineers will also drive automation in infrastructure deployment and orchestration using Kubernetes, Flux, and Terraform, build cloud-agnostic platforms, collaborate with AI/ML researchers to enable reproducible experiments, and contribute to open-source projects and publications. The ideal candidate holds a Master’s degree in Computer Science or related fields, has 7+ years of experience in DevOps or SRE roles, strong expertise in cloud computing and distributed systems, CI/CD, containerization, observability tools, infrastructure-as-code, and scripting in Python, Go, or Bash. Additional experience in AI/ML, HPC systems, or modern AI infrastructure (Fluidstack, Coreweave, Vast) is a plus. This position is primarily based in Paris or London, with remote flexibility for select European countries under specific conditions, requiring onboarding in Paris and regular monthly visits. Mistral offers competitive salary, equity, health insurance, transportation and sport allowances, meal vouchers, pension plan, parental leave, visa sponsorship, and the chance to join a collaborative, fast-paced team shaping the future of AI.
Apply For this Job