Mellis

ML Engineer, Inference & Optimization

Mellis · Palo Alto, CA
Palo Alto, CA $185K–$350K Posted 2026-06-23
Salary
$185K–$350K
Type
Full-time
Experience
5+ yr

ABOUT THE ROLE

We are seeking Senior/Staff level Inference Engineers to accelerate the performance of Pika's AI-driven products. In this highly technical role, you will operate at the intersection of cutting-edge inference acceleration, GPU parallelism, advanced model deployment, and video generation technologies. Your expertise will drive significant improvements to model speed and efficiency, ensuring our creative AI systems deliver industry-leading user experiences at scale.

You will design and optimize inference pipelines, implement state-of-the-art acceleration techniques, and work closely with researchers and engineers across the team to push the boundaries of what’s possible in real-time AI deployment. Your efforts will play a foundational role in powering the next generation of Pika’s video and language models.

WHAT YOU’LL DO

  • Accelerate Inference: Lead and implement advanced inference acceleration techniques, including attention optimization and quantization for efficient model serving.
  • Maximize GPU Parallelism: Engineer and optimize GPU strategies across tensor, sequence, and pipeline parallelism (TP, SP, PP) for maximal efficiency and scalability.
  • Programming for Performance: Develop and optimize high-performance computing kernels and distributed workloads using CUDA and NCCL.
  • Advance AI Deployment: Collaborate with research and engineering teams to bring state-of-the-art videogen and large language models into production.
  • Improve Training Efficiency: (Bonus) Contribute to improvements in model training speed, stability, and resource utilization as part of our deployment lifecycle.
  • Technical Excellence: Drive rigorous code reviews, participate in technical discussions, and mentor fellow engineers on best practices in inference and GPU programming.

WHAT WE’RE LOOKING FOR

  • Experience: 5+ years engineering experience, with a strong track record in inference acceleration and model deployment at scale.
  • Inference Mastery: Proven expertise in inference optimization, including quantization, attention acceleration, and deep learning compiler stacks.
  • GPU & Parallelism: Deep knowledge of GPU programming (CUDA, NCCL) and experience with SP, TP, PP, and other forms of parallelism for distributed inference.
  • AI Domain Knowledge: Familiarity with video generation (videogen) models and large language models (LLMs).
  • Collaboration: Strong cross-discipline communication skills; able to drive shared goals across research and engineering functions.
  • Ownership Mindset: Self-driven, solutions-oriented, and capable of managing ambiguity in a fast-paced startup environment.
  • Bonus: Experience in enhancing training efficiency, stability, or resource optimization for large models.

NICE TO HAVE

  • Experience with high-throughput video or real-time streaming model deployment
  • Familiarity with distributed training and optimization toolkits
  • Contributions to open source projects in AI infrastructure or deep learning compilers
  • Startup or rapid prototyping experience

WHAT WE OFFER

  • Competitive salary in the AI industry
  • Equity in a fast-growing startup shaping the future of AI
  • Comprehensive health benefits, monthly stipends, company retreats
  • A supportive and collaborative office culture—we’re all building and launching together

ABOUT PIKA

At Pika, we're crafting a future where video creation is seamless, intuitive, and universally accessible. Our mission is to empower creativity by breaking down technical barriers using the transformative power of AI. We’re a tight-knit, energetic team based in Palo Alto, CA, valuing efficiency, curiosity, and the ambition to make a meaningful impact on the world.

We work from our Palo Alto office 3–5 days a week and welcome applicants who are eager to contribute onsite.

$105K — 10th pctl $265K — 90th pctl
This role’s midpoint $267K vs. market median $180K for Data & ML roles
+50%
above median
Based on 2,000+ Data & ML roles with disclosed salary ranges tracked on NewJob.
E
Multimodal LLM Researcher
Palo Alto, CA
Engineering
$185K–$400K
E
Real-time Video Researcher
Palo Alto, CA
Engineering
$185K–$400K
D
Research Scientist, Data
Palo Alto, CA
Data & ML
$185K–$400K
See all 10+ roles at Mellis →
E
Lead Machine Learning Engineer, Inference & Performance
Egen Remote Remote
Data & ML
$159K–$250K
P
Machine Learning Engineer, Inference & Serving (Speech LLM)
Plaud San Francisco, CA Hybrid
Data & ML
$170K–$320K
R
Machine Learning Engineer, Ads Optimization & Ads Marketplace Quality
Reddit Remote (US) Remote
Data & ML
$185K–$303K
W
Machine Learning Engineer, Runtime & Optimization
Waymo Mountain View, CA Hybrid
Data & ML
$213K–$263K
See all Data & ML roles →

Interested in this role?

Apply directly on the company site — no recruiter middleman, no account required.

Apply now →
Apply on company site