𧬠AbLang2: Transformer-based Antibody Language Model
This repository provides HuggingFace-compatible π€ implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the Oxford Protein Informatics Group (OPIG) and is available at:
π― Model Available
- ablang2: AbLang2 model for antibody sequences
π¦ Installation
Install the required dependencies:
# Install core dependencies
pip install transformers numpy pandas rotary-embedding-torch
# Install ANARCI from bioconda (required for antibody numbering)
conda install -c bioconda anarci
Note: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel.
π Loading Model from Hugging Face Hub
Method 1: Load Model and Tokenizer, then Import Adapter
import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
# Import and create the adapter
from adapter import AbLang2PairedHuggingFaceAdapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
Method 2: Using importlib (Alternative)
import importlib.util
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model and tokenizer
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# Load adapter dynamically
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
spec = importlib.util.spec_from_file_location("adapter", adapter_path)
adapter_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(adapter_module)
# Create the adapter
ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
Note: Model automatically use GPU when available, otherwise fall back to CPU.
βοΈ Available Utilities
This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace.
- seqcoding: Sequence-level representations (averaged across residues)
- rescoding: Residue-level representations (per-residue embeddings)
- likelihood: Raw logits for amino acid prediction at each position
- probability: Normalized probabilities for amino acid prediction
- pseudo_log_likelihood: Uncertainty scoring with stepwise masking (masks each residue)
- confidence: Fast uncertainty scoring (single forward pass, no masking)
- restore: Restore masked residues (*) with predicted amino acids
All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation.
The AbLang2PairedHuggingFaceAdapter
class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to:
- Access all AbLang2 utilities (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation
- Work with antibody sequences (heavy and light chains) seamlessly
- Maintain compatibility with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities
π‘ Examples
π AbLang2 (Paired Sequences) - Restore Example
import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# 1. Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# 2. Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
from adapter import AbLang2PairedHuggingFaceAdapter
# 3. Create adapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
# 4. Restore masked sequences
masked_seqs = [
['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS',
'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK']
]
restored = ablang(masked_seqs, mode='restore')
print(f"Restored sequences: {restored}")
π Detailed Usage
For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see:
test_ablang2_HF_implementation.ipynb
- Complete notebook with all utilities and advanced usage patterns
π Citation
If you use these models in your research, please cite the original AbLang2 paper:
AbLang2:
@article{Olsen2024,
title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
journal={bioRxiv},
doi={https://doi.org/10.1101/2024.02.02.578678},
year={2024}
}
- Downloads last month
- 759