Existing RAG systems are largely built around a single-answer assumption and are primarily optimized for correctness. However, many real-world information-seeking queries are open-ended and admit multiple plausible answers. In such settings, standard RAG often collapses to homogeneous outputs—even when retrieved contexts contain diverse evidence, as illustrated below.
DIVERGE is a plug-and-play, agentic Retrieval-Augmented Generation (RAG) framework designed to enhance output diversity for open-ended information-seeking queries while maintaining high answer quality. Unlike standard RAG systems that are optimized for a single correct answer, DIVERGE explicitly models and explores multiple viewpoints through iterative retrieval and generation, while maintaining high answer quality. The overall architecture is shown below.
This repository contains the reference implementation and evaluation code for the paper
“DIVERGE: Diversity-Enhanced Retrieval-Augmented Generation for Open-Ended Questions.”
📄 Paper: https://arxiv.org/pdf/2602.00238
🗂️ Dataset: https://huggingface.co/datasets/au-clan/Diverge
DIVERGE improves diversity through three core mechanisms:
-
Reflection-Guided Viewpoint Generation
The model reflects on previously generated answers to extract salient viewpoints and explicitly proposes new, insufficiently covered viewpoints to guide future retrieval and generation. -
Diversity-Aware Retrieval
Retrieved documents are reranked by jointly considering:- relevance to the current query
- diversity with respect to previously retrieved contexts (memory)
- diversity among candidates selected in the current iteration
-
Viewpoint-Conditioned Generation with Memory
Generation is conditioned on both the original query and a target viewpoint, while a lightweight memory prevents repetition and supports long-horizon diversity.
Importantly, DIVERGE operates entirely at the retrieval and prompting level and does not rely on token-level logits or decoding hyperparameters, making it compatible with any frontier or closed-source LLM.
We provide a installation process to set up a virtual environment and install the necessary dependencies for our experiments. Follow the steps below to get started.
python -m venv .venv
source .venv/bin/activatepython -m venv .venv
.venv\Scripts\activateInstall the project in editable mode:
pip install -e .This step installs the local divrag package defined in pyproject.toml.
pip install -r requirements.txtThis step installs the exact dependency versions used in our experiments.
To use GPT-based models, you need to provide your OpenAI API key as an environment variable.
export OPENAI_API_KEY="your_api_key_here"setx OPENAI_API_KEY "your_api_key_here"After setting the key, restart your terminal if necessary.
You can verify the key is available in Python:
python -c "import os; print(os.getenv('OPENAI_API_KEY') is not None)"If the output is True, the key has been successfully configured.
- Both installation steps are required.
pyproject.tomlmakes the local package (divrag) importable.requirements.txtensures full reproducibility of the experimental environment.
python ./src/example.pypython ./src/main.py --output_dir "path/to/output"python ./src/evaluate.py --input_file "path/to/input.json" --output_dir "path/to/output"The repository is organized as follows:
.
├── src/ # Source code for DIVERGE framework
├── results/ # Experiment output logs
├── data/ # Data and human annotation directory
├── assets/ # Figures for README (framework, examples)
│
├── pyproject.toml # Package configuration
├── requirements.txt # Dependencies
├── example.py # Example script to run DIVERGE
├── main.py # Example script to run DIVERGE for full experiments
│
└── README.md # Project documentation
If you use this dataset, please cite:
@article{hu2026diverge,
title={DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking},
author={Hu, Tianyi and Tandon, Niket and Arora, Akhil},
journal={arXiv preprint arXiv:2602.00238},
year={2026}
}
