🧬 AbLang2: Transformer-based Antibody Language Model

This repository provides HuggingFace-compatible 🤗 implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the Oxford Protein Informatics Group (OPIG) and is available at:

AbLang2: https://github.com/TobiasHeOl/AbLang2

🎯 Model Available

ablang2: AbLang2 model for antibody sequences

📦 Installation

Install the required dependencies:

# Install core dependencies
pip install transformers numpy pandas rotary-embedding-torch

# Install ANARCI from bioconda (required for antibody numbering)
conda install -c bioconda anarci

Note: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel.

🚀 Loading Model from Hugging Face Hub

Method 1: Load Model and Tokenizer, then Import Adapter

import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download

# Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)

# Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)

# Import and create the adapter
from adapter import AbLang2PairedHuggingFaceAdapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)

Method 2: Using importlib (Alternative)

import importlib.util
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download

# Load model and tokenizer
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)

# Load adapter dynamically
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
spec = importlib.util.spec_from_file_location("adapter", adapter_path)
adapter_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(adapter_module)

# Create the adapter
ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)

Note: Model automatically use GPU when available, otherwise fall back to CPU.

⚙️ Available Utilities

This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace.

seqcoding: Sequence-level representations (averaged across residues)
rescoding: Residue-level representations (per-residue embeddings)
likelihood: Raw logits for amino acid prediction at each position
probability: Normalized probabilities for amino acid prediction
pseudo_log_likelihood: Uncertainty scoring with stepwise masking (masks each residue)
confidence: Fast uncertainty scoring (single forward pass, no masking)
restore: Restore masked residues (*) with predicted amino acids

All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation.

The AbLang2PairedHuggingFaceAdapter class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to:

Access all AbLang2 utilities (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation
Work with antibody sequences (heavy and light chains) seamlessly
Maintain compatibility with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities

💡 Examples

🔗 AbLang2 (Paired Sequences) - Restore Example

import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download

# 1. Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)

# 2. Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
from adapter import AbLang2PairedHuggingFaceAdapter

# 3. Create adapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)

# 4. Restore masked sequences
masked_seqs = [
    ['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS',
     'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK']
]
restored = ablang(masked_seqs, mode='restore')
print(f"Restored sequences: {restored}")

📚 Detailed Usage

For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see:

test_ablang2_HF_implementation.ipynb - Complete notebook with all utilities and advanced usage patterns

📖 Citation

If you use these models in your research, please cite the original AbLang2 paper:

AbLang2:

@article{Olsen2024,
  title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
  author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
  journal={bioRxiv},
  doi={https://doi.org/10.1101/2024.02.02.578678},
  year={2024}
}

hemantn
/

ablang2