🔬 GIANT
Gigapixel Image Agent for Navigating Tissue
The Problem: Whole-slide pathology images contain billions of pixels—10,000× more than an LLM can see at once. Previous approaches used blurry thumbnails or random patches, severely underestimating what frontier models can do.
The Solution: GIANT lets LLMs navigate gigapixel images like pathologists do—iteratively pan, zoom, and reason across the slide until they can answer a diagnostic question.
"GPT-5 with GIANT achieves 62.5% accuracy on pathologist-authored questions, outperforming specialist pathology models such as TITAN (43.8%) and SlideChat (37.5%)." — Buckley et al., 2025
1. LOAD → Open gigapixel WSI, generate thumbnail with coordinate guides
2. OBSERVE → LLM sees current view + conversation history
3. REASON → "I see suspicious tissue at (45000, 32000). Let me zoom in..."
4. ACT → Crop high-resolution region OR provide final answer
5. REPEAT → Continue until confident diagnosis (max 20 steps)
The agent accumulates evidence across multiple zoom levels—just like a pathologist scanning a slide.
# Install
uv sync && source .venv/bin/activate
# Configure API
export OPENAI_API_KEY=sk-...
# Run on a slide
giant run /path/to/slide.svs -q "What type of tissue is this?"
# Run benchmark (requires MultiPathQA CSV + WSI files; see docs/data/data-acquisition.md)
giant benchmark gtex --provider openai -vEvaluated on MultiPathQA—934 questions across 862 unique whole-slide images.
| Benchmark | Task | Our Result | Paper (GIANT) | Paper (GIANT x5) | Thumbnail Baseline |
|---|---|---|---|---|---|
| GTEx | Organ Classification (20-way) | 70.3%† | 53.7% ± 3.4% | 60.7% ± 3.2% | 36.5% ± 3.4% |
| ExpertVQA | Pathologist-Authored (128 Q) | 60.1% | 57.0% ± 4.5% | 62.5% ± 4.4% | 50.0% ± 4.4% |
| SlideBench | Visual QA (197 Q) | 51.8% | 58.9% ± 3.5% | 59.4% ± 3.4% | 54.8% ± 3.5% |
| TCGA | Cancer Diagnosis (30-way) | 26.2% | 32.3% ± 3.5% | 29.3% ± 3.3% | 9.2% ± 1.9% |
| PANDA | Prostate Grading (6-way) | 20.3% | 23.2% ± 2.3% | 25.4% ± 2.0% | 12.2% ± 2.2% |
†GTEx: 70.3% on 185/191 scored items; 67.6% ± 3.1% paper-faithful (6 parse errors counted incorrect). Both exceed paper.
All 5 MultiPathQA benchmarks complete. See docs/results/benchmark-results.md for detailed analysis.
Key findings:
- GTEx (70.3%) and ExpertVQA (60.1%) exceed the paper's single-run GIANT results
- Agent navigation provides up to ~3× improvement over thumbnail baselines
- Total benchmark cost: $124.64 across 934 questions
| Provider | Model | Status |
|---|---|---|
| OpenAI | gpt-5.2 |
✅ Default |
| Anthropic | claude-sonnet-4-5-20250929 |
✅ Supported |
gemini-3-pro-preview |
🔜 Planned |
| Section | Description |
|---|---|
| Installation | Environment setup |
| Quickstart | First inference in 5 minutes |
| Architecture | System design and components |
| Algorithm | Navigation loop explained |
| Running Benchmarks | Reproduce paper results |
| Configuring Providers | API key setup |
| Data Acquisition | Download WSI files (~500 GiB) |
| CLI Reference | Command-line options |
For Clinicians: Frontier LLMs can now reason over full pathology slides—not just patches. This opens doors for AI-assisted diagnosis, second opinions, and education.
For Researchers: A reproducible benchmark (MultiPathQA) and framework for evaluating LLM capabilities on gigapixel medical images. Proves that how you test matters as much as what you test.
For Developers: Production-ready implementation with 90% test coverage, strict typing, resumable benchmarks, cost tracking, and trajectory visualization.
@article{buckley2025navigating,
title={Navigating Gigapixel Pathology Images with Large Multimodal Models},
author={Buckley, Thomas A. and Weihrauch, Kian R. and Latham, Katherine and
Zhou, Andrew Z. and Manrai, Padmini A. and Manrai, Arjun K.},
journal={arXiv preprint arXiv:2511.19652},
year={2025}
}- Paper: arXiv:2511.19652
- Dataset: MultiPathQA on HuggingFace
- Documentation: Full Docs
Built for reproducible research in computational pathology.