Contract — AI Agent Evaluation

Aaru · Remote · $100k - $150k
contract mid

About this role

Evaluate and benchmark Aaru's synthetic research agents. Design evaluation protocols, run A/B tests, and measure agent accuracy against human researchers. 3-month contract with extension possibility. Remote-friendly.

Requirements

Experience with LLM evaluation or user research. Statistical analysis skills. Python. Available to start immediately.