Retrieval Ground Truth

Article relevance rankings scored by multiple judge models per prompt.

No retrieval ground truth files found. Run python -m eval.run_ground_truth to generate.