Inference Engineer
We're partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads.
Their platform is rethinking how inference runs at scale - intelligently orchestrating workloads across heterogeneous hardware to unlock major gains in performance, efficiency, and cost. The team is solving some of the hardest problems in modern AI infrastructure: inference scheduling, KV cache management, runtime optimization, memory efficiency, and low-latency serving across distributed systems.
They're looking for engineers who care deeply about how models execute in production — not just training models, but making them fast, scalable, and reliable under real-world load.
What You'll Work On
What They're Looking For
Why Join