About this role
Why RoboForce
RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
We are looking for a Senior / Staff AI Research Engineer, Data Infrastructure to build the data and learning engine behind RoboForce's Physical AI stack. In this role, you will own the full pipeline — from raw teleoperation and UMI device data collection through curation, annotation, and storage, to post-training infrastructure that scores demonstrations, identifies failure patterns, and closes the loop back into model retraining.
Responsibilities
•
Design and maintain end-to-end data collection pipelines ingesting multimodal demonstration data from teleoperation devices and UMI hardware, including synchronization, versioning, and distributed storage at scale.
•
Build annotation tooling and data curation workflows — quality filtering, deduplication, episode scoring, and domain reweighting — to produce high-quality training datasets for robot policy learning.
•
Develop post-SFT reinforcement learning infrastructure: implement reward scoring on demonstrations, mine and categorize failure patterns, and feed curated failure data back into the retraining loop.
•
Build evaluation and test infrastructure to log policy rollouts on-robot, capture structured results, and surface actionable diagnostics for the research team.
•
Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces that support rapid iteration on VLA and manipulation policy training.
•
Architect scalable storage and retrieval systems for heterogeneous robot data (vision, proprioception, action, language) across both cloud and on-prem environments.
Requirements
•
Bachelor's or Master's degree in Computer Science, Robotics, or related field with 5+ years of experience.
•
Strong proficiency in Python and experience building production-grade data pipelines and ETL systems.
•
Hands-on experience with large-scale dataset management, including versioning, deduplication, quality filtering, and distributed storage (e.g., S3, GCS, HDF5, WebDataset, Zarr).
•
Experience building or working with post-training infrastructure — SFT pipelines, reward modeling, or RL training loops (e.g., PPO, DPO, rejection sampling).
•
Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows sufficient to collaborate tightly with research teams.
•
Requires 5 days/week in-office collaboration with the teams.
Bonus Qualifications
•
Experience with robotics data collection hardware — teleoperation devices, UMI, GELLO, or similar — and the synchronization and preprocessing challenges they introduce.
•
Familiarity with robot learning pipelines: imitation learning, behavior cloning, or VLA/VLM fine-tuning workflows.
•
Experience building evaluation or experiment tracking infrastructure (e.g., Weights & Biases, MLflow, custom rollout loggers).
•
Proven ability to design annotation tooling or human-in-the-loop labeling systems for structured or multimodal data.
Benefits
•
Competitive stock options/equity programs.
•
Health, dental, and vision insurance, 401(k) plan.
•
Visa sponsorship and green card support for qualified candidates.
•
Lunches and dinners, a fully stocked kitchen, and regular team-building events.
RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
We are looking for a Senior / Staff AI Research Engineer, Data Infrastructure to build the data and learning engine behind RoboForce's Physical AI stack. In this role, you will own the full pipeline — from raw teleoperation and UMI device data collection through curation, annotation, and storage, to post-training infrastructure that scores demonstrations, identifies failure patterns, and closes the loop back into model retraining.
Responsibilities
•
Design and maintain end-to-end data collection pipelines ingesting multimodal demonstration data from teleoperation devices and UMI hardware, including synchronization, versioning, and distributed storage at scale.
•
Build annotation tooling and data curation workflows — quality filtering, deduplication, episode scoring, and domain reweighting — to produce high-quality training datasets for robot policy learning.
•
Develop post-SFT reinforcement learning infrastructure: implement reward scoring on demonstrations, mine and categorize failure patterns, and feed curated failure data back into the retraining loop.
•
Build evaluation and test infrastructure to log policy rollouts on-robot, capture structured results, and surface actionable diagnostics for the research team.
•
Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces that support rapid iteration on VLA and manipulation policy training.
•
Architect scalable storage and retrieval systems for heterogeneous robot data (vision, proprioception, action, language) across both cloud and on-prem environments.
Requirements
•
Bachelor's or Master's degree in Computer Science, Robotics, or related field with 5+ years of experience.
•
Strong proficiency in Python and experience building production-grade data pipelines and ETL systems.
•
Hands-on experience with large-scale dataset management, including versioning, deduplication, quality filtering, and distributed storage (e.g., S3, GCS, HDF5, WebDataset, Zarr).
•
Experience building or working with post-training infrastructure — SFT pipelines, reward modeling, or RL training loops (e.g., PPO, DPO, rejection sampling).
•
Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows sufficient to collaborate tightly with research teams.
•
Requires 5 days/week in-office collaboration with the teams.
Bonus Qualifications
•
Experience with robotics data collection hardware — teleoperation devices, UMI, GELLO, or similar — and the synchronization and preprocessing challenges they introduce.
•
Familiarity with robot learning pipelines: imitation learning, behavior cloning, or VLA/VLM fine-tuning workflows.
•
Experience building evaluation or experiment tracking infrastructure (e.g., Weights & Biases, MLflow, custom rollout loggers).
•
Proven ability to design annotation tooling or human-in-the-loop labeling systems for structured or multimodal data.
Benefits
•
Competitive stock options/equity programs.
•
Health, dental, and vision insurance, 401(k) plan.
•
Visa sponsorship and green card support for qualified candidates.
•
Lunches and dinners, a fully stocked kitchen, and regular team-building events.
Tech stack
PythonPyTorch
About RoboForce
RoboForce is hiring for the senior / staff ai research engineer, data infrastructure role. NewJob aggregates active openings directly from RoboForce's applicant tracking system, so this listing is current.
More jobs at RoboForce →