Contract — AI Agent Evaluation

Aaru · Remote · $100k - $150k

contract mid

Apply Now

evaluation python agents statistics nlp benchmarking

About this role

Evaluate and benchmark Aaru's synthetic research agents. Design evaluation protocols, run A/B tests, and measure agent accuracy against human researchers. 3-month contract with extension possibility. Remote-friendly.

Requirements

Experience with LLM evaluation or user research. Statistical analysis skills. Python. Available to start immediately.

Contract — AI Agent Evaluation

About this role

Requirements

Job Details