Back to all positionsData

Dataset Engineer

Remote (LATAM)contractRemote OK

Position Filled· Closed July 15, 2025

About This Role

We were looking for a Dataset Engineer to build high-quality training and evaluation datasets for our AI systems. This role was crucial in improving model performance through better data—because in AI, data quality often matters more than model architecture.

You would have worked at the foundation of our AI systems, creating the datasets that power everything from customer service agents to content generation tools for the world's largest airlines.

What You Would Do

  • Design and create datasets for AI model training, fine-tuning, and evaluation
  • Develop systematic data collection pipelines from various sources
  • Build and manage annotation workflows, including guidelines and quality control
  • Clean, validate, and preprocess data to ensure high quality standards
  • Create synthetic data generation pipelines for scenarios with limited real data
  • Document dataset schemas, biases, limitations, and usage guidelines
  • Collaborate with ML engineers to understand data requirements and iterate on datasets
  • Implement data versioning and lineage tracking
  • Analyze dataset characteristics and identify gaps or biases
  • Stay current with best practices in dataset creation for LLMs

Requirements

  • 2+ years experience in data engineering, data science, or related field
  • Strong understanding of data quality principles and validation techniques
  • Proficiency in Python and data processing tools (pandas, SQL, etc.)
  • Experience with text data and NLP preprocessing
  • Excellent attention to detail and systematic approach to work
  • Good documentation and communication skills
  • Understanding of how training data affects model behavior

Nice to Have

  • Experience with annotation tools (Label Studio, Prodigy, Scale AI)
  • Background in linguistics, content creation, or domain expertise in travel
  • Knowledge of synthetic data generation techniques
  • Experience with data labeling workforce management
  • Understanding of LLM fine-tuning and RLHF data requirements
  • Familiarity with data privacy and PII handling

What We Offered

  • Competitive hourly rate in USD
  • Flexible remote work with async communication
  • 3-6 month initial contract with extension potential
  • Exposure to cutting-edge AI projects and methodologies
  • Potential path to full-time employment
  • Work directly with senior AI engineers and researchers

This position has been filled. Check our careers page for current openings.