E2E Response Evaluation

Response quality scores across synthesis models and judge panels.

No E2E response ground truth files found. Run python -m eval.run_ground_truth to generate.