Diverge

🎯 Overview

Existing RAG systems are largely built around a single-answer assumption and are primarily optimized for correctness. However, many real-world information-seeking queries are open-ended and admit multiple plausible answers. In such settings, standard RAG often collapses to homogeneous outputs—even when retrieved contexts contain diverse evidence, as illustrated below.

DIVERGE is a plug-and-play, agentic Retrieval-Augmented Generation (RAG) framework designed to enhance output diversity for open-ended information-seeking queries while maintaining high answer quality. Unlike standard RAG systems that are optimized for a single correct answer, DIVERGE explicitly models and explores multiple viewpoints through iterative retrieval and generation, while maintaining high answer quality. The overall architecture is shown below.

This repository contains the reference implementation and evaluation code for the paper
“DIVERGE: Diversity-Enhanced Retrieval-Augmented Generation for Open-Ended Questions.”

📄 Paper: https://arxiv.org/pdf/2602.00238

🗂️ Dataset: https://huggingface.co/datasets/au-clan/Diverge

🧠 Key Ideas

DIVERGE improves diversity through three core mechanisms:

Reflection-Guided Viewpoint Generation
The model reflects on previously generated answers to extract salient viewpoints and explicitly proposes new, insufficiently covered viewpoints to guide future retrieval and generation.
Diversity-Aware Retrieval
Retrieved documents are reranked by jointly considering:
- relevance to the current query
- diversity with respect to previously retrieved contexts (memory)
- diversity among candidates selected in the current iteration
Viewpoint-Conditioned Generation with Memory
Generation is conditioned on both the original query and a target viewpoint, while a lightweight memory prevents repetition and supports long-horizon diversity.

Importantly, DIVERGE operates entirely at the retrieval and prompting level and does not rely on token-level logits or decoding hyperparameters, making it compatible with any frontier or closed-source LLM.

🛠️ Installation

We provide a installation process to set up a virtual environment and install the necessary dependencies for our experiments. Follow the steps below to get started.

1. Create a Virtual Environment

macOS / Linux

python -m venv .venv
source .venv/bin/activate

Windows

python -m venv .venv
.venv\Scripts\activate

2. Install the Local Package

Install the project in editable mode:

pip install -e .

This step installs the local divrag package defined in pyproject.toml.

3. Install Experiment Dependencies

pip install -r requirements.txt

This step installs the exact dependency versions used in our experiments.

4. Set Up LLM API Key

To use GPT-based models, you need to provide your OpenAI API key as an environment variable.

macOS / Linux

export OPENAI_API_KEY="your_api_key_here"

Windows (PowerShell)

setx OPENAI_API_KEY "your_api_key_here"

After setting the key, restart your terminal if necessary.

You can verify the key is available in Python:

python -c "import os; print(os.getenv('OPENAI_API_KEY') is not None)"

If the output is True, the key has been successfully configured.

Notes

Both installation steps are required.
pyproject.toml makes the local package (divrag) importable.
requirements.txt ensures full reproducibility of the experimental environment.

🚀 How to Start

Run Demo Example

python ./src/example.py

Run Main Pipeline

python ./src/main.py --output_dir "path/to/output"

Run Demo Example

python ./src/evaluate.py --input_file "path/to/input.json" --output_dir "path/to/output"

📁 Project Structure

The repository is organized as follows:

.
├── src/                     # Source code for DIVERGE framework
├── results/              # Experiment output logs
├── data/                    # Data and human annotation directory
├── assets/                  # Figures for README (framework, examples)
│
├── pyproject.toml           # Package configuration
├── requirements.txt         # Dependencies
├── example.py               # Example script to run DIVERGE
├── main.py                  # Example script to run DIVERGE for full experiments
│
└── README.md                # Project documentation

Citation

If you use this dataset, please cite:

@article{hu2026diverge,
  title={DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking},
  author={Hu, Tianyi and Tandon, Niket and Arora, Akhil},
  journal={arXiv preprint arXiv:2602.00238},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diverge

🎯 Overview

🧠 Key Ideas

🛠️ Installation

1. Create a Virtual Environment

macOS / Linux

Windows

2. Install the Local Package

3. Install Experiment Dependencies

4. Set Up LLM API Key

macOS / Linux

Windows (PowerShell)

Notes

🚀 How to Start

Run Demo Example

Run Main Pipeline

Run Demo Example

📁 Project Structure

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
data		data
results		results
src		src
README.md		README.md
example.py		example.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Diverge

🎯 Overview

🧠 Key Ideas

🛠️ Installation

1. Create a Virtual Environment

macOS / Linux

Windows

2. Install the Local Package

3. Install Experiment Dependencies

4. Set Up LLM API Key

macOS / Linux

Windows (PowerShell)

Notes

🚀 How to Start

Run Demo Example

Run Main Pipeline

Run Demo Example

📁 Project Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages