|
--- |
|
|
|
|
|
license: mit |
|
tags: |
|
- biology |
|
- protein |
|
- antibody |
|
- ablang |
|
- transformers |
|
- pytorch |
|
- chemistry |
|
- oas |
|
- cdr |
|
- ablang2 hf implementation |
|
- roberta |
|
- ESM |
|
- ablang2 |
|
- antibody-design |
|
|
|
|
|
|
|
metrics: |
|
- sequence modeling |
|
- protein language model |
|
library_name: transformers |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
# 𧬠AbLang2: Transformer-based Antibody Language Model |
|
|
|
This repository provides HuggingFace-compatible π€ implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the [Oxford Protein Informatics Group (OPIG)](https://opig.stats.ox.ac.uk/) and is available at: |
|
- **AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2) |
|
|
|
## π― Model Available |
|
|
|
- **ablang2**: AbLang2 model for antibody sequences |
|
|
|
## π¦ Installation |
|
|
|
Install the required dependencies: |
|
|
|
```bash |
|
# Install core dependencies |
|
pip install transformers numpy pandas rotary-embedding-torch |
|
|
|
# Install ANARCI from bioconda (required for antibody numbering) |
|
conda install -c bioconda anarci |
|
``` |
|
|
|
**Note**: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel. |
|
|
|
## π Loading Model from Hugging Face Hub |
|
|
|
### Method 1: Load Model and Tokenizer, then Import Adapter |
|
```python |
|
import sys |
|
import os |
|
from transformers import AutoModel, AutoTokenizer |
|
from huggingface_hub import hf_hub_download |
|
|
|
# Load model and tokenizer from Hugging Face Hub |
|
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) |
|
|
|
# Download adapter and add to path |
|
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") |
|
cached_model_dir = os.path.dirname(adapter_path) |
|
sys.path.insert(0, cached_model_dir) |
|
|
|
# Import and create the adapter |
|
from adapter import AbLang2PairedHuggingFaceAdapter |
|
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) |
|
``` |
|
|
|
### Method 2: Using importlib (Alternative) |
|
```python |
|
import importlib.util |
|
from transformers import AutoModel, AutoTokenizer |
|
from huggingface_hub import hf_hub_download |
|
|
|
# Load model and tokenizer |
|
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) |
|
|
|
# Load adapter dynamically |
|
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") |
|
spec = importlib.util.spec_from_file_location("adapter", adapter_path) |
|
adapter_module = importlib.util.module_from_spec(spec) |
|
spec.loader.exec_module(adapter_module) |
|
|
|
# Create the adapter |
|
ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) |
|
``` |
|
|
|
**Note**: Model automatically use GPU when available, otherwise fall back to CPU. |
|
|
|
## βοΈ Available Utilities |
|
|
|
This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace. |
|
|
|
- **seqcoding**: Sequence-level representations (averaged across residues) |
|
- **rescoding**: Residue-level representations (per-residue embeddings) |
|
- **likelihood**: Raw logits for amino acid prediction at each position |
|
- **probability**: Normalized probabilities for amino acid prediction |
|
- **pseudo_log_likelihood**: Uncertainty scoring with stepwise masking (masks each residue) |
|
- **confidence**: Fast uncertainty scoring (single forward pass, no masking) |
|
- **restore**: Restore masked residues (*) with predicted amino acids |
|
|
|
All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation. |
|
|
|
The `AbLang2PairedHuggingFaceAdapter` class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to: |
|
|
|
- **Access all AbLang2 utilities** (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation |
|
- **Work with antibody sequences** (heavy and light chains) seamlessly |
|
- **Maintain compatibility** with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities |
|
|
|
## π‘ Examples |
|
|
|
### π AbLang2 (Paired Sequences) - Restore Example |
|
```python |
|
import sys |
|
import os |
|
from transformers import AutoModel, AutoTokenizer |
|
from huggingface_hub import hf_hub_download |
|
|
|
# 1. Load model and tokenizer from Hugging Face Hub |
|
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True) |
|
|
|
# 2. Download adapter and add to path |
|
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py") |
|
cached_model_dir = os.path.dirname(adapter_path) |
|
sys.path.insert(0, cached_model_dir) |
|
from adapter import AbLang2PairedHuggingFaceAdapter |
|
|
|
# 3. Create adapter |
|
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer) |
|
|
|
# 4. Restore masked sequences |
|
masked_seqs = [ |
|
['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS', |
|
'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK'] |
|
] |
|
restored = ablang(masked_seqs, mode='restore') |
|
print(f"Restored sequences: {restored}") |
|
``` |
|
|
|
## π Detailed Usage |
|
|
|
For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see: |
|
- **[`test_ablang2_HF_implementation.ipynb`](test_ablang2_HF_implementation.ipynb)** - Complete notebook with all utilities and advanced usage patterns |
|
|
|
## π Citation |
|
|
|
If you use these models in your research, please cite the original AbLang2 paper: |
|
|
|
**AbLang2:** |
|
``` |
|
@article{Olsen2024, |
|
title={Addressing the antibody germline bias and its effect on language models for improved antibody design}, |
|
author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane}, |
|
journal={bioRxiv}, |
|
doi={https://doi.org/10.1101/2024.02.02.578678}, |
|
year={2024} |
|
} |
|
``` |