H
Hyphen Connect Limited

Synthetic Data Engineer

Boston, MA Posted 2026-04-24
Type
Full-time
Source
Greenhouse
We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.
 
Responsibilities:


• Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.

• Implement automated quality scoring and de-duplication systems.

• Manage data pipelines that feed directly into SFT and DPO training loops.

Qualifications:


• Proven experience building large-scale data pipelines (Airflow, Spark, Ray).

• Deep knowledge of prompt engineering for data generation.

• Familiarity with dataset distillation and bias mitigation.
AirflowSpark
Hyphen Connect Limited is hiring for the synthetic data engineer role. NewJob aggregates active openings directly from Hyphen Connect Limited's applicant tracking system, so this listing is current. More jobs at Hyphen Connect Limited →
Apply on company site