ESMCBA Evolutionary Scale Modeling Binding Affinity - Model for binding affinity prediction of the peptide-MHC interaction

image/graph_abstract2

Code: https://github.com/sermare/ESMCBA
Models: https://huggingface.co/smares/ESMCBA

This guide shows a new user how to:

  • get the code
  • download one or more model checkpoints from Hugging Face
  • run predictions and get Hidden Layer embeddings with embeddings.py
  • understand where files are stored and how to point to them

Quick Start (Run it in Collab)

You can access this notebook to run with google collab:

Open in Colab


πŸš€ Quick Start with pip

ESMCBA is now available on PyPI! Install it with a single command:

pip install esmcba

Basic Usage

Once installed, you can run predictions directly from the command line:

esmcba --hla A0201 \
  --peptides KIQEGVVDYGA VLMSNLGMPS DTLRVEAFEYY \
  --encoding epitope \
  --output_dir ./outputs

Complete Example

Here's a full example with multiple peptides for HLA-A*02:01:

esmcba --hla A0201 \
  --peptides KIQEGVVDYGA VLMSNLGMPS DTLRVEAFEYY AKKPTETI FKLNIKLLGVG \
             ETSNSFDVLK INVIVFDGKSK VDFCGKGYHLM AYPLTKHPNQ RAMPNMLRI \
             FIASFRLFA YIFFASFYYV SLIDFYLCFL FLTENLLLYI YMPYFFTLL \
             FLLPSLATV FLAFLLFLV YFIASFRLFA FFFLYENAFL FLIGCNYLG \
             YLATALLTL FLHFLPRV YLCFLAFLLF YLKLTDNVYI KLMGHFAWWT \
             TLMNVLTLV YLTNDVSFL FLPFAMGI LLADKFPV SMWSFNPET \
             LLMPILTLT LVAEWFLAYI FLYLYALVYF LMSFTVL MWLSYFIA \
             FLNGSCGSV LVLSVNPYV GLCVDIPGI \
  --encoding epitope \
  --output_dir ./outputs

Output Files

After running, you'll find in your output directory:

  • A0201-ESMCBA_embeddings.npy - Raw ESM embeddings
  • A0201-ESMCBA_umap.csv - UMAP visualization coordinates

Available Options

esmcba --help

Key parameters:

  • --hla: HLA allele (e.g., A0201, B1402, C0501)
  • --peptides: Space-separated list of peptide sequences
  • --encoding: Encoding type (epitope or hla, default: epitope)
  • --output_dir: Directory for output files (default: ./outputs)
  • --batch_size: Batch size for inference (default: 10)
  • --umap_dims: UMAP dimensions, 2 or 3 (default: 2)

1. Requirements

  • Python 3.9 or newer
  • PyTorch 1.13+ or 2.x
  • huggingface_hub for downloads

Install the basics:

# Install core PyTorch and Transformers ecosystem
pip install torch
pip install transformers
pip install esm

# Install Hugging Face Hub utilities
pip install "huggingface-hub<1.0"

# Optional: Install hf_transfer for faster large file downloads
pip install hf_transfer

pip install biopython umap-learn scikit-learn seaborn pandas matplotlib

2. Get the code

git clone https://github.com/sermare/ESMCBA

Inside the repo you should have embeddings.py available. If your file is named embeddings_generation.py, use that name instead in the commands below.

3. Pick a model checkpoint (all models at the end of this Markdown)

All checkpoints live in the model repo: smares/ESMCBA.

Examples of filenames:

  • ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth
  • ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAB1503_2_0.0001_1e-05__2_B1503_0404_Hubber_B1503_final.pth
  • ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_6_HLAB5101_5_0.001_1e-06__3_B5101_Hubber_B5101_final.pth

You can browse all files here: https://huggingface.co/smares/ESMCBA

image/4a_SOTA_models-4

4. Download a checkpoint

Option A β€” download to a folder next to the code

# download everything to ./models
hf download smares/ESMCBA --repo-type model --local-dir ./models

#or just get one model
huggingface-cli download smares/ESMCBA \
  "ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_5_0.001_1e-06_AUG_6_HLAA0201_2_0.001_1e-06__2_A0201_Hubber_A0201_final.pth" \
  --repo-type model \
  --local-dir ./models

Option B β€” rely on the Hugging Face cache

If you omit --local-dir, files go into your HF cache, for example:

~/.cache/huggingface/hub/

To move the cache:

export HF_HOME=/path/to/cache

5. Run embeddings.py

Below are example invocations. Replace the checkpoint filename and HLA tag with one that matches your use case.

Example 1 β€” using a file you downloaded to ./models

python3 embeddings_generation.py   --model_path ./models/ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_6_HLAB5101_5_0.001_1e-06__3_B5101_Hubber_B5101_final.pth   --name B5101-ESMCBA   --hla B5101   --encoding epitope   --output_dir ./outputs --peptides ASCQQQRAGHS ASCQQQRAGH ASCQQQRAG DVRLSAHHHR DVRLSAHHHRM GHSDVRLSAHH 

Example 2 β€” let the script pull from the Hub automatically

If embeddings_generation.py supports resolving from the Hub, you can pass either a file name or an hf:// path and let the script download to cache.

cd ESMCBA/ESMCBA

python3 embeddings_generation.py   --model_path "ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth"   --name B1402-ESMCBA   --hla B1402   --encoding epitope   --output_dir ./outputs --peptides ASCQQQRAGHS ASCQQQRAGH ASCQQQRAG DVRLSAHHHR DVRLSAHHHRM GHSDVRLSAHH

or

python3 embeddings_generation.py   --model_path "hf://smares/ESMCBA/ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth"   --name B1402-ESMCBA   --hla B1402   --encoding epitope   --output_dir ./outputs --peptides ASCQQQRAGHS ASCQQQRAGH ASCQQQRAG DVRLSAHHHR DVRLSAHHHRM GHSDVRLSAHH

GPU vs CPU

  • By default PyTorch will use GPU if available.
  • To force CPU, set CUDA_VISIBLE_DEVICES="" or modify embeddings.py to pass map_location="cpu" to torch.load.

6. Minimal pattern inside embeddings.py to resolve paths

If you want the script to accept local files, simple names, or hf://:

import os
from huggingface_hub import hf_hub_download

def resolve_model_path(path_or_name, default_repo="smares/ESMCBA"):
    if os.path.isfile(path_or_name):
        return path_or_name
    if path_or_name.startswith("hf://"):
        # format: hf://<repo_id>/<filename>
        _, _, repo_id, filename = path_or_name.split("/", 3)
        return hf_hub_download(repo_id=repo_id, filename=filename, repo_type="model")
    return hf_hub_download(repo_id=default_repo, filename=path_or_name, repo_type="model")

Then in your loader:

import torch

ckpt_path = resolve_model_path(args.model_path)
state = torch.load(ckpt_path, map_location="cpu")  # change to cuda as needed

7. Common tasks

  • List files available in the model repo:

    git lfs ls-files | cat   # if you cloned the HF repo
    

    or browse the Hub page.

  • Download all files for offline use:

    hf download smares/ESMCBA --repo-type model --local-dir ./models
    
  • Keep outputs tidy:

    mkdir -p ./outputs
    

8. Troubleshooting

  • huggingface-cli download is deprecated
    Use hf download instead.

  • Permission or quota errors when downloading
    Public models do not require login. For private models, run hf login.

  • Slow transfers
    Install hf_transfer and export HF_HUB_ENABLE_HF_TRANSFER=1.

  • File not found
    Double check the exact filename on the Hub. Filenames are long. Copy and paste.

9. Models

HLA Model checkpoint
B5101 ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_6_HLAB5101_5_0.001_1e-06__3_B5101_Hubber_B5101_final.pth
A0206 ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_25_0.0001_1e-06_AUG_1_HLAA0206_2_0.001_1e-06__1_A0206_Hubber_A0206_final.pth
B3701 ESMCBA_epitope_0.5_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAB3701_1_0.0001_1e-05__1_B3701_0404_Hubber_B3701_final.pth
B5301 ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_15_0.001_1e-06_AUG_6_HLAB5301_1_0.0001_1e-05__1_B5301_0404_Hubber_B5301_final.pth
A2402 ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_1_HLAA2402_1_0.0001_1e-06__2_A2402_0404_Hubber_A2402_final.pth
C0802 ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_20_0.001_5e-05_AUG_1_HLAC0802_2_0.0001_1e-05__2_C0802_0404_Hubber_C0802_final.pth
A0301 ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_25_0.001_0.001_AUG_1_HLAA0301_1_0.001_1e-06__1_A0301_Hubber_A0301_final.pth
B3501 ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAB3501_2_0.001_0.001__4_B3501_Hubber_B3501_final.pth
C1502 ESMCBA_epitope_0.5_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAC1502_2_0.0001_1e-06__1_C1502_0404_Hubber_C1502_final.pth
B4601 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_1e-06_AUG_6_HLAB4601_1_0.0001_1e-05__2_B4601_0404_Hubber_B4601_final.pth
C0501 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_1e-06_AUG_6_HLAC0501_2_0.0001_1e-06__2_C0501_0404_Hubber_C0501_final.pth
A3201 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAA3201_2_0.0001_1e-06__1_A3201_0404_Hubber_A3201_final.pth
A0205 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_3_HLAA0205_2_0.0001_1e-06__2_A0205_0404_Hubber_A0205_final.pth
A3001 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.0001_1e-06_AUG_3_HLAA3001_4_0.0001_0.001__3_A3001_Hubber_A3001_final.pth
A0101 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_6_HLAA0101_2_0.001_0.001__3_A0101_Hubber_A0101_final.pth
C1203 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_1_HLAC1203_1_0.0001_1e-05__2_C1203_0404_Hubber_C1203_final.pth
A0207 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAA0207_1_0.0001_1e-06__2_A0207_0404_Hubber_A0207_final.pth
A0211 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAA0211_2_0.0001_1e-06__1_A0211_0404_Hubber_A0211_final.pth
B5801 ESMCBA_epitope_0.8_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAB5801_2_0.0001_1e-06__2_B5801_0404_Hubber_B5801_final.pth
B0702 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_15_0.0001_0.001_AUG_6_HLAB0702_3_0.001_1e-06__4_B0702_Hubber_B0702_final.pth
C0701 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAC0701_2_0.0001_1e-05__1_C0701_0404_Hubber_C0701_final.pth
B3801 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_3_HLAB3801_2_0.0001_1e-06__1_B3801_0404_Hubber_B3801_final.pth
C0303 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_3_HLAC0303_1_0.0001_1e-05__2_C0303_0404_Hubber_C0303_final.pth
B4501 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_1_HLAB4501_2_0.0001_1e-05__2_B4501_0404_Hubber_B4501_final.pth
B4001 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAB4001_1_0.0001_1e-06__2_B4001_0404_Hubber_B4001_final.pth
A0201 ESMCBA_epitope_0.8_30_ESMMASK_epitope_FT_5_0.001_1e-06_AUG_6_HLAA0201_2_0.001_1e-06__2_A0201_Hubber_A0201_final.pth
C0602 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAC0602_2_0.0001_1e-06__1_C0602_0404_Hubber_C0602_final.pth
A2501 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_1_HLAA2501_1_0.0001_1e-06__1_A2501_0404_Hubber_A2501_final.pth
B5401 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_20_0.001_5e-05_AUG_1_HLAB5401_2_0.0001_1e-06__2_B5401_0404_Hubber_B5401_final.pth
A1101 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.0001_1e-05_AUG_3_HLAA1101_5_0.001_1e-06__2_A1101_Hubber_A1101_final.pth
B1801 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.0001_1e-05_AUG_6_HLAB1801_1_0.001_1e-06__4_B1801_Hubber_B1801_final.pth
B1501 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_0.001_AUG_3_HLAB1501_2_0.001_0.001__2_B1501_Hubber_B1501_final.pth
A6801 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_1_HLAA6801_2_0.0001_1e-06__4_A6801_Hubber_A6801_final.pth
B2705 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAB2705_2_0.0001_1e-06__2_B2705_0404_Hubber_B2705_final.pth
C0401 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAC0401_2_0.0001_1e-06__1_C0401_0404_Hubber_C0401_final.pth
B1502 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1502_1_1e-05_1e-05__1_B1502_0404_Hubber_B1502_final.pth
A0202 ESMCBA_epitope_0.95_20_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAA0202_1_0.0001_1e-05__2_A0202_0404_Hubber_A0202_final.pth
A2601 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_15_0.0001_1e-05_AUG_1_HLAA2601_5_0.001_0.001__4_A2601_Hubber_A2601_final.pth
C0702 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_15_0.001_5e-05_AUG_1_HLAC0702_1_0.0001_1e-05__1_C0702_0404_Hubber_C0702_final.pth
A3301 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_20_0.001_0.001_AUG_1_HLAA3301_5_0.001_1e-06__4_A3301_Hubber_A3301_final.pth
B0801 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_20_0.001_1e-06_AUG_1_HLAB0801_1_0.0001_1e-06__1_B0801_0404_Hubber_B0801_final.pth
B1517 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_20_0.001_5e-05_AUG_3_HLAB1517_1_0.0001_1e-05__2_B1517_0404_Hubber_B1517_final.pth
A0203 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_0.001_AUG_6_HLAA0203_2_0.001_0.001__2_A0203_Hubber_A0203_final.pth
B5701 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_1_HLAB5701_2_0.0001_1e-05__1_B5701_Hubber_B5701_final.pth
B4402 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-05_AUG_3_HLAB4402_1_0.001_0.001__2_B4402_Hubber_B4402_final.pth
A6802 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAA6802_2_0.001_1e-06__4_A6802_Hubber_A6802_final.pth
B4403 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAB4403_1_0.0001_1e-06__1_B4403_0404_Hubber_B4403_final.pth
C1402 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_3_HLAC1402_1_0.0001_1e-06__1_C1402_0404_Hubber_C1402_final.pth
B4002 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_1e-06_AUG_6_HLAB4002_2_0.0001_1e-05__1_B4002_0404_Hubber_B4002_final.pth
A3101 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAA3101_2_0.0001_1e-06__2_A3101_0404_Hubber_A3101_final.pth
B1402 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_3_HLAB1402_2_1e-05_1e-06__1_B1402_0404_Hubber_B1402_final.pth
B1503 ESMCBA_epitope_0.95_30_ESMMASK_epitope_FT_25_0.001_5e-05_AUG_6_HLAB1503_2_0.0001_1e-05__2_B1503_0404_Hubber_B1503_final.pth

10. Repro tip for papers and reviews

Record the exact commit of the code and the model snapshot. Example:

Code commit: <git SHA from ESMCBA repo>
Model snapshot: <commit SHA shown in HF snapshots path>
HLA: B5101
Encoding: epitope

11. License and citation

arxiv.org/abs/2507.13077

Follow the license in the GitHub repo for code and the model card in the Hub repo for weights. If you use ESMCBA in research, please cite the associated manuscript or submission.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for smares/ESMCBA

Finetuned
(5)
this model