Research Engineer, LLM Post-Training - Lila Sciences - Massachusetts

Your Impact at Lila

We're seeking a Machine Learning Research Engineer specializing in LLM Post Training you'll design and maintain large-scale training systems, optimize performance for massive models, and integrate cutting-edge techniques to improve efficiency and throughput.

What You'll Be Building

Ray-based distributed training infrastructure for LLMs and multi-modal models.
Performance optimizations for large-scale model training including training and optimization workflows (SFT, MoE, long-context scaling).
Orchestrate frontier and open source LLMs along with complex compute-intensive tool use
Scalable pipelines for data preprocessing and experiment orchestration, including tools for efficient data loading, pipeline parallelism, and optimizer tuning.
System-level performance benchmarks and debugging utilities.

What You'll Need to Succeed

Proven experience with distributed ML training frameworks (Megatron-LM, TorchTitan, DeepSpeed, Ray).
Strong software engineering skills (Python, C++ kernel contributions are a plus).
Understanding of large-scale model training techniques.
Experience with cloud or HPC environments.

Bonus Points For

Prior work with scientific datasets or domain-specific modeling.
Contributions to open-source ML frameworks.

Location

San Francisco, CA or Cambridge, MA (Hybrid and On-Site available depending on team needs).

About Lila

Lila Sciences is the world's first scientific superintelligence platform and autonomous lab for life, chemistry, and materials science. We are pioneering a new age of boundless discovery by building the capabilities to apply AI to every aspect of the scientific method. We are introducing scientific superintelligence to solve humankind's greatest challenges, enabling scientists to bring forth solutions in human health, climate, and sustainability at a pace and scale never experienced before. Learn more about this mission at www.lila.ai

If this sounds like a