Full Time

Software Engineer - LLM Systems, Generative AI Infrastructure & Agentic Platforms - Apple - Cupertino, CA

Apple

Cupertino, CA
147K–272K a year
Posted today

The Intelligence Platform team builds scalable, production-grade systems that power high-quality, user-centric intelligence across Apple’s operating systems. We focus on designing and operating large-scale ML systems leveraging Generative AI, Large Language Models, RAG architectures, and emerging agentic AI patterns. Our goal is to deliver reliable, low-latency, and privacy-preserving AI capabilities at scale.

Description

We are looking for a Software Engineer with strong systems and engineering expertise to build and scale LLM-powered systems in production. This role focuses on designing robust infrastructure for LLM serving, tool-use orchestration, and agentic workflows.

You will work at the intersection of ML and systems engineering-translating advanced AI capabilities into efficient, scalable, and reliable systems. You will play a key role in shaping system architecture, optimizing performance, and ensuring production readiness of LLM-driven features across Apple platforms.

","responsibilities":"* Design and build scalable systems for LLM inference, orchestration, and agentic workflows (e.g., tool-use pipelines, multi-step reasoning systems).
• Productionize LLM-based solutions with a focus on latency, throughput, reliability, and scalability.
• Architect and maintain infrastructure for model serving, batching, caching, and context management.
• Develop and optimize pipelines for RAG systems, retrieval infrastructure, and data flow across components.
• Partner with modeling teams to integrate models into production systems, ensuring alignment with performance and product requirements.
• Build monitoring, evaluation, and feedback systems to ensure high-quality and robust model behavior in production.
• Drive system-level optimizations across the stack, including distributed systems, concurrency, and resource management.

Preferred Qualifications

Experience with LLM serving systems, inference optimization, batching strategies, or caching (KV/prefix).

Experie