LLM Output Evaluation Platform

Welcome 👋

How it works

  1. You'll see a question and two anonymized answers — Answer A and Answer B — from two different models. You won't be told which model produced which answer.
  2. Pick the better answer, or mark them as both good (equally good) or both bad.
  3. Optionally add a comment explaining your choice.
  4. Click Next › to move on. Your progress saves automatically — you can stop and resume anytime using the same name.
  5. When you're done, you may optionally leave overall feedback at the bottom.

Choose what to compare

Pick the two answer sets you want to evaluate against each other.

vs