AI Inference Engineer

We’re hiring an AI Inference Engineer to help us in building reliable, high-performance production systems. You will be focused on:

Ideal Experiences

Worked on system optimizations for model serving, such as batching, caching, load balancing, and model parallelism
Worked on low-level optimizations for inference, such as GPU kernels and code generation
Worked on algorithmic optimizations for inference, such as quantization, distillation, and speculative decoding
Worked on large-scale, high concurrent production serving
Worked on testing, benchmarking, and reliability of inference services