TMAP2

Tree-based visualization for high-dimensional data. Organizes similar items into interactive tree structures. Ideal for chemical space, protein embeddings, single-cell data, or any high-dimensional dataset.

Why Trees?

Most dimensionality reduction tools (UMAP, t-SNE) produce point clouds. TMAP produces a tree, a connected structure where every point is linked to its neighbors through branches. This makes the layout itself explorable: you can follow branches, trace paths between any two points, and discover how regions connect.

For example, in a TMAP of pet breed images, following the branch from terriers toward cats reveals that the bridge between the two groups runs through chihuahuas and sphynx cats (the bald ones) which is both hilarious and logical; both are small, short-haired, big-eyed. The tree doesn't just cluster similar things it also shows you how dissimilar things are connected.

Because the layout is a tree, you get operations that point clouds can't support:

path = model.path(idx_a, idx_b) # nodes along the tree path
d = model.distance(idx_a, idx_b # sum of edge weights along the path
pseudotime = model.distances_from(idx) # tree distance from one point to all others

Installation

pip install tmap2

Optional extras:

pip install rdkit # chemistry helpers (fingerprints_from_smiles, molecular_properties)
pip install jupyter-scatter # notebook interactive widgets

Note: The import name is tmap, not tmap2.

Quick Start

import numpy as np
from tmap import TMAP

# Binary fingerprints (Jaccard)
X = np.random.randint(0, 2, (1000, 2048), dtype=np.uint8)
model = TMAP(metric="jaccard", n_neighbors=20, seed=42).fit(X)
model.to_html("map.html")

# Dense embeddings (cosine / euclidean)
X = np.random.random((1000, 128)).astype(np.float32)
model = TMAP(metric="cosine", n_neighbors=20).fit(X)
new_coords = model.transform(X[:10])

# Interactive notebook widget
model.plot(color_by="label", data=df, tooltip_properties=["name", "score"])

Key Features

Tree structure: follow branches, trace paths, compute pseudotime
Deterministic: same input + seed = same output
Multiple metrics: jaccard, cosine, euclidean, precomputed
Incremental: add_points() and transform() for new data
Model persistence: save() / load()
Three viz backends: interactive HTML, jupyter-scatter, matplotlib

Visualization

Interactive HTML: lasso selection, light/dark theme, filter and search panels, pinned metadata cards, binary mode for large datasets.

Notebook widgets: color switching, categorical filtering, and lasso selection with pandas-backed metadata:

viz = model.to_tmapviz()
viz.add_color_layout("Molecular Weight", mw.tolist(), categorical=False)
viz.add_color_layout("Scaffold", scaffolds, categorical=True, color="tab10")
viz.add_label("SMILES", smiles_list)
viz.show(width=1000, height=620, controls=True)

Static plots — matplotlib for publication figures: model.plot_static(color_by=labels)

Domain Utilities

Built-in helpers for common scientific workflows:

from tmap.utils.chemistry import fingerprints_from_smiles, molecular_properties
from tmap.utils.proteins import fetch_uniprot, sequence_properties
from tmap.utils.singlecell import from_anndata

Domain	Metric	Utilities
Chemoinformatics	`jaccard`	`fingerprints_from_smiles`, `molecular_properties`, `murcko_scaffolds`
Proteins	`cosine` / `euclidean`	`fetch_uniprot`, `fetch_alphafold`, `read_fasta`, `sequence_properties`
Single-cell	`cosine` / `euclidean`	`from_anndata`, `cell_metadata`, `marker_scores`
Generic embeddings	`cosine` / `euclidean` / `precomputed`	No domain utils needed

Notebooks

Notebook	Topic
01 Quick Start	End-to-end walkthrough
02 MinHash Deep Dive	Encoding methods and when to use each
03 Legacy LSH Pipeline	Lower-level MinHash + LSHForest + layout workflow
04 Notebook Widgets	Selection, filtering, zoom, export
05 Single-Cell	RNA-seq with PBMC 3k, pseudotime, UMAP comparison
06 Metric Guide	Choosing the right metric
07 FAQ	Troubleshooting and common questions
08 Cheminformatics	Molecules, fingerprints, SAR
09 Protein Analysis	FASTA, ESM embeddings, AlphaFold
11 Card Configuration	Pinned card layout, fields, and links
11 Default Params Benchmark	Defaults across dataset sizes and types
12 USearch Jaccard	Binary Jaccard with USearch backend

Lower-Level Pipeline

For direct control over indexing, hashing, and layout, see the legacy pipeline notebook. The main building blocks:

from tmap.index import USearchIndex           # dense / binary kNN
from tmap import MinHash, LSHForest           # Jaccard on sets / strings
from tmap.layout import LayoutConfig, layout_from_lsh_forest

Your Data
   ├─→ Binary matrix ─────────→ USearch        (Jaccard / cosine / euclidean)
   └─→ Sets / strings ───────→ MinHash → LSHForest
                ↓
             k-NN Graph → MST → OGDF Tree Layout → Interactive Visualization

Development

git clone https://github.com/afloresep/tmap2.git
cd tmap2
pip install ".[dev]"
pytest -v

License

MIT License - see LICENSE for details.

Based on the original TMAP by Daniel Probst and Jean-Louis Reymond.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.github/workflows		.github/workflows
cpp		cpp
docs		docs
examples		examples
extern		extern
notebooks		notebooks
src/tmap		src/tmap
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TMAP2

Why Trees?

Installation

Quick Start

Key Features

Visualization

Domain Utilities

Notebooks

Lower-Level Pipeline

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

TMAP2

Why Trees?

Installation

Quick Start

Key Features

Visualization

Domain Utilities

Notebooks

Lower-Level Pipeline

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 1

Languages

Packages