This repository contains well-established datasets for interpretable and reliable protein language model (pLM) benchmarking.
All included datasets are listed below. Details and files can be found in the respective folders.
The following experimental datasets can be found on a separate branch. They are not part of the official release.
- (Supervised) binding
- Known limitation: Dataset size
- (Supervised) membrane
- Known limitation: Data imbalance
If you want to benchmark a new or existing pLM on these datasets, please check out one of the following methods:
- biotrainer: autoeval - Automatic evaluation of pLMs on our supervised benchmark datasets. You can find an example notebook here and compare your results on our visual dashboard.