Protein (language model) Benchmarking Collection - PBC

This repository contains well-established datasets for interpretable and reliable protein language model (pLM) benchmarking.

Datasets

All included datasets are listed below. Details and files can be found in the respective folders.

The following experimental datasets can be found on a separate branch. They are not part of the official release.

If you want to benchmark a new or existing pLM on these datasets, please check out one of the following methods:

biotrainer: autoeval - Automatic evaluation of pLMs on our supervised benchmark datasets. You can find an example notebook here and compare your results on our visual dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
scripts		scripts
supervised		supervised
LICENSE		LICENSE
README.md		README.md