Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Inference Engineer

2026-05-24 Acceler8 Talent Santa Rosa,CA

Description:

Inference Engineer

We're partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads.

Their platform is rethinking how inference runs at scale - intelligently orchestrating workloads across heterogeneous hardware to unlock major gains in performance, efficiency, and cost. The team is solving some of the hardest problems in modern AI infrastructure: inference scheduling, KV cache management, runtime optimization, memory efficiency, and low-latency serving across distributed systems.

They're looking for engineers who care deeply about how models execute in production — not just training models, but making them fast, scalable, and reliable under real-world load.

What You'll Work On

Designing and optimizing large-scale inference pipelines
Improving latency, throughput, and concurrency under production workloads
Building inference runtimes and serving infrastructure
Optimizing batching, scheduling, and request orchestration
Managing KV cache allocation, reuse, placement, and eviction strategies
Improving prefill/decode performance and memory efficiency
Profiling bottlenecks across model, runtime, and distributed system layers
Collaborating closely with compiler, kernel, and systems engineers

What They're Looking For

Strong systems engineering fundamentals
Experience building or scaling ML inference / model serving systems
Deep understanding of performance optimization and memory behavior
Experience with runtimes such as vLLM, TensorRT-LLM, or custom serving infrastructure
Strong understanding of transformer architectures and attention mechanisms
Familiarity with batching, scheduling, concurrency, and cache management
Strong Python and/or C++ engineering skills

Why Join

Work on cutting-edge inference infrastructure and AI systems problems
Build systems designed for next-generation AI scale
Small, highly technical engineering team
Significant ownership and technical impact
Opportunity to shape foundational infrastructure for future AI workloads

Job Details

View jobs in our app

Inference Engineer

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Inference Engineer

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care