A network of vetted domain experts working with frontier AI labs.
Subject matter experts across STEM, programming, and AI safety — calibrated against project-specific rubrics and matched to work that fits their depth.
A growing pool of people who actually do the work.
Quality-first process
Every annotator clears a domain-specific probe before they touch a real project. Continuous calibration on hidden gold tasks follows them across engagements.
Rapid delivery pipeline
Tight feedback loops with your research team, daily batches in flight, and project-specific QA tiers keep iteration cycles short.
Domain expert network
STEM PhDs, working engineers, and safety researchers — sourced for depth, not headcount. Rates and rhythm set per discipline.
Cost-effective at scale
Tiered staffing and reusable rubric infrastructure keep per-example cost predictable as the program scales.
Where the network goes deep.
Four core specialties, each led by a senior in the field who writes the probes and owns quality end-to-end.
LLM data trainers
Generalists across RLHF, instruction tuning, and preference labeling. Strong English, prompt-engineering literate, attentive to nuance.
- RLHF
- Preference data
- Prompt design
Code data specialists
Senior engineers writing clean, well-documented code across languages. Algorithms, code review, software testing.
- Python
- JavaScript
- Algorithms
- Code review
STEM domain annotators
MS/PhD-level experts in math, physics, chemistry, biology, and CS. Multi-step problem authoring, verifiable answers.
- Mathematics
- Physics
- Chemistry
- Biology
- CS
Safety & alignment
Researchers with backgrounds in AI safety, ethics, or trust & safety. Adversarial thinking calibrated against helpfulness.
- Red-teaming
- Refusal training
- Content moderation
How we vet.
Every annotator clears a three-stage process before they touch real data. Calibration follows them across projects.
Credential review
Documents, publications, and prior work checked by the domain lead before the candidate sees any real data.
Domain probe
A paid probe authored by a senior in the field, scored blind by two reviewers. Arbitration on disagreement.
Continuous calibration
Hidden gold tasks score every active annotator across projects. Reliability ratings inform routing and pay tier.
What the work looks like.
- ─Top of market rates
- ─Flexible hours, set your own schedule
- ─Remote-first — work from anywhere
- ─Full-time or part-time (10+ hrs/week)
- ─Real impact on how frontier models learn
Apply once.
We'll match you to projects whose rate and rhythm fit your depth. Set your own schedule — minimum 10–20 hours/week.