Dataset Engineer | Careers at AI Champions

About This Role

We were looking for a Dataset Engineer to build high-quality training and evaluation datasets for our AI systems. This role was crucial in improving model performance through better data—because in AI, data quality often matters more than model architecture.

You would have worked at the foundation of our AI systems, creating the datasets that power everything from customer service agents to content generation tools for the world's largest airlines.

What You Would Do

Design and create datasets for AI model training, fine-tuning, and evaluation
Develop systematic data collection pipelines from various sources
Build and manage annotation workflows, including guidelines and quality control
Clean, validate, and preprocess data to ensure high quality standards
Create synthetic data generation pipelines for scenarios with limited real data
Document dataset schemas, biases, limitations, and usage guidelines
Collaborate with ML engineers to understand data requirements and iterate on datasets
Implement data versioning and lineage tracking
Analyze dataset characteristics and identify gaps or biases
Stay current with best practices in dataset creation for LLMs

Requirements

2+ years experience in data engineering, data science, or related field
Strong understanding of data quality principles and validation techniques
Proficiency in Python and data processing tools (pandas, SQL, etc.)
Experience with text data and NLP preprocessing
Excellent attention to detail and systematic approach to work
Good documentation and communication skills
Understanding of how training data affects model behavior

Nice to Have

Experience with annotation tools (Label Studio, Prodigy, Scale AI)
Background in linguistics, content creation, or domain expertise in travel
Knowledge of synthetic data generation techniques
Experience with data labeling workforce management
Understanding of LLM fine-tuning and RLHF data requirements
Familiarity with data privacy and PII handling

What We Offered

Competitive hourly rate in USD
Flexible remote work with async communication
3-6 month initial contract with extension potential
Exposure to cutting-edge AI projects and methodologies
Potential path to full-time employment
Work directly with senior AI engineers and researchers