Job Details

Inference Engineer

  2026-05-24     Acceler8 Talent     Santa Rosa,CA  
Description:

Inference Engineer

We're partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads.

Their platform is rethinking how inference runs at scale - intelligently orchestrating workloads across heterogeneous hardware to unlock major gains in performance, efficiency, and cost. The team is solving some of the hardest problems in modern AI infrastructure: inference scheduling, KV cache management, runtime optimization, memory efficiency, and low-latency serving across distributed systems.

They're looking for engineers who care deeply about how models execute in production — not just training models, but making them fast, scalable, and reliable under real-world load.

What You'll Work On

  • Designing and optimizing large-scale inference pipelines
  • Improving latency, throughput, and concurrency under production workloads
  • Building inference runtimes and serving infrastructure
  • Optimizing batching, scheduling, and request orchestration
  • Managing KV cache allocation, reuse, placement, and eviction strategies
  • Improving prefill/decode performance and memory efficiency
  • Profiling bottlenecks across model, runtime, and distributed system layers
  • Collaborating closely with compiler, kernel, and systems engineers

What They're Looking For

  • Strong systems engineering fundamentals
  • Experience building or scaling ML inference / model serving systems
  • Deep understanding of performance optimization and memory behavior
  • Experience with runtimes such as vLLM, TensorRT-LLM, or custom serving infrastructure
  • Strong understanding of transformer architectures and attention mechanisms
  • Familiarity with batching, scheduling, concurrency, and cache management
  • Strong Python and/or C++ engineering skills

Why Join

  • Work on cutting-edge inference infrastructure and AI systems problems
  • Build systems designed for next-generation AI scale
  • Small, highly technical engineering team
  • Significant ownership and technical impact
  • Opportunity to shape foundational infrastructure for future AI workloads


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search