Data Scientist — LLM Evaluation

Axion AI · Remote (US) · $140k - $220k

full-time mid

Apply Now

python llm evaluation statistics nlp benchmarking

About this role

Design and implement evaluation frameworks for large language models. Build benchmarks, run experiments, and measure model quality across dimensions. Your work determines which models ship and which don't.

Requirements

Strong statistics background. Experience with LLM evaluation or NLP benchmarking. Python required. Experience with statistical testing.

Data Scientist — LLM Evaluation

About this role

Requirements

Job Details