OIAI Generative Text Evaluation

Welcome, and thank you for helping us test the OIAI Generative Models 👋

You are looking at a platform that generates automated responses to questions about fraud risks in humanitarian contexts. We would like your honest reaction to what it produces.

What we are asking you to do

You will see a question and two anonymized responses, labelled Answer A and Answer B. Please read both answers and decide which one you find more useful / better.

A useful answer is one that helps the user understand the issue or take appropriate action. It should answer the question directly, provide accurate and relevant information, include enough detail to satisfy the request, be clear and easy to understand, and avoid unsupported or misleading claims.

For each one question, please consider:

Which response do you think is best, and why?
Which is the weakest, and why?
How could the better response be improved?

For each comparison, pick the better response, or mark them as both good (equally good) or both bad, and add a few words explaining your choice. Written comments are optional but genuinely the most valuable part. Even a short note helps us a lot.

Do not reward unnecessary length. Do not penalize concise answers if they are useful and fully answer the question. Penalize unsupported claims, evasiveness, unnecessary verbosity, and failure to follow instructions.

Your progress saves automatically, you can stop and resume anytime using the same username. When you're done, you may optionally leave overall feedback at the bottom.

To keep judgments reliable, the session will pause automatically after every 10 questions. Your progress is saved, so you can take a break and pick up where you left off the next day.

A quick note before you start

This is not a test of your knowledge, and there are no right or wrong answers. We are testing the platform, not you. We want to learn if the people who will use this platform find the responses helpful, clear, and trustworthy.

Your feedback is anonymized and used only to improve the responses going forward.

Enter a username to begin

Experience level *

Role title (optional)

Welcome, and thank you for helping us test the OIAI Generative Models 👋

What we are asking you to do

A quick note before you start

Answer A

Answer B

Time for a break ☕