Job Details

AI Inference Software Engineer

  2025-03-18     Signify Technology     Sonoma,CA  
Description:

Job Title: AI Inference Engineer – Real-Time Systems

Location: Hybrid, San Fransisco

Base Salary Range: $200,000-$400,000


Role Overview

We are looking for a talented AI Inference Engineer to help build and optimize real-time inference systems for advanced AI models, specifically designed for multimodal data processing such as text and audio and text to 3D or Video. This role will involve designing and implementing highly efficient and scalable inference engines that deliver low-latency, high-performance AI services. You will work with cutting-edge technologies in AI, backend engineering, and cloud infrastructure to enable seamless experiences across platforms.


Key Responsibilities

  • Inference Engine Development: Design and optimize real-time AI inference engines that support multimodal data processing, including both audio and text inputs.
  • Real-Time Systems: Develop high-throughput, low-latency pipelines for handling AI model inference, ensuring that performance and scalability meet the needs of production systems.
  • Technology Integration: Leverage technologies such as WebRTC, FastAPI, and cloud-native infrastructure to support real-time AI inference and communication.
  • Cross-Platform Support: Ensure the inference engine integrates efficiently with various platforms (iOS, Android, desktop), enabling smooth user experiences.
  • API Development: Build and maintain APIs to support scalable, real-time AI interactions, ensuring seamless communication between AI models and frontend applications.
  • Performance Optimization: Focus on optimizing AI inference systems to minimize latency, maximize throughput, and enhance overall system performance.
  • Collaborative Development: Work closely with product teams, engineers, and data scientists to ensure that the inference engine aligns with product goals and delivers optimal user experiences.
  • Infrastructure Management: Contribute to the management of cloud infrastructure, including GPU server clusters, CI/CD pipelines, and containerized environments (Docker, Kubernetes).
  • Fault Handling & Scalability: Implement effective fault tolerance strategies and design scalable systems to meet the demands of real-time AI inference at scale.


Required Skills & Qualifications

  • AI Inference Expertise: Proven experience building and optimizing inference engines for multimodal AI systems, particularly in real-time applications involving both audio and text.
  • Real-Time System Design: Strong knowledge of designing low-latency, high-performance systems capable of handling complex inference tasks in production environments.
  • Cloud Infrastructure: Experience working with AWS, GCP, or other cloud platforms, including managing server clusters and leveraging cloud-native technologies for scaling AI inference systems.
  • Backend Development: Proficiency in languages such as Python, Go, or similar for developing backend services. Experience with frameworks like FastAPI for building efficient APIs is a plus.
  • Performance Optimization: Expertise in optimizing inference pipelines, improving computation efficiency, and reducing system latency.
  • Cross-Platform Development: Familiarity with supporting mobile (iOS, Android) and desktop platforms for AI-powered applications.
  • Collaboration Skills: Excellent communication skills and the ability to collaborate effectively with cross-functional teams (product, AI, and engineering).
  • Continuous Integration/Deployment: Experience working with CI/CD tools (e.g., Jenkins, GitHub Actions) and managing software deployments in a production environment.

Preferred Qualifications

  • 4-5 years of experience in AI inference engine development or related fields.
  • Hands-on experience with WebRTC, LiveKit, or similar technologies for real-time communication.
  • Familiarity with containerization technologies like Docker and orchestration tools like Kubernetes for managing AI inference services.
  • Experience with GPU server management and optimizing workloads for AI models.
  • Ability to write clear technical documentation and maintain high coding standards.


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search