AI Inference Engineer
We’re hiring an AI Inference Engineer to help us in building reliable,
high-performance production systems. You will be focused on:
- Optimizing the latency and throughput of model inference
- Building reliable production serving systems
- Accelerating research on scaling test-time compute
Ideal Experiences
- Worked on system optimizations for model serving, such as batching, caching,
load balancing, and model parallelism
- Worked on low-level optimizations for inference, such as GPU kernels and code
generation
- Worked on algorithmic optimizations for inference, such as quantization,
distillation, and speculative decoding
- Worked on large-scale, high concurrent production serving
- Worked on testing, benchmarking, and reliability of inference services
Bonus Skills
- Experience with verifiable inference or proof systems
- Familiarity with Bittensor or token-incentivized networks
- Deployment experience on H100s or large-scale GPU clusters using Kubernetes