We are looking for an RL Environment Data Engineer / Researcher to design, build, and refine reinforcement learning training environments across different domains. This role will focus on data collection, task definition, reward design, evaluation criteria, anti-reward-hacking mechanisms, and post-training validation of environment data effectiveness.
Responsibilities
- - Design and improve RL training environments across various task domains.
- - Collect, clean, structure, and evaluate data used for RL environment construction and model post-training.
- - Define task objectives, reward functions, and evaluation standards to ensure reliable and reproducible training signals.
- - Develop technical approaches to prevent reward hacking and identify loopholes in reward design.
- - Build validation environments to assess the effectiveness of post-training data and RL environment design.
- - Collaborate with research, engineering, and data teams to improve environment coverage, task difficulty, and evaluation reliability.
- - Follow research progress in RL environments, data evaluation, AI agents, and post-training methods, and apply relevant findings to production workflows.
Requirements
- - Strong coding skills, especially in Python, with the ability to independently build data pipelines, environments, and evaluation tools.
- - Proficiency with AI coding tools for code generation, debugging, refactoring, and rapid experimentation.
- - Solid understanding of reinforcement learning, post-training, reward function design, environment design, and data evaluation.
- - Ability to translate real-world tasks into trainable and measurable RL environments.
- Experience with data scraping, data cleaning, annotation, or data quality assessment is preferred.
- - Experience with LLM agents, RLHF/RLAIF, coding agents, automated evaluation, or benchmark construction is a strong plus.
- - Strong experimental mindset and engineering execution, with the ability to continuously improve systems based on data and evaluation results.