ablang2 / README.md
hemantn's picture
rotary-embedding-torch modul added in requirement file, updated readme to accees cached directory
5a9ba24
---
#language:
#- en
license: mit
tags:
- biology
- protein
- antibody
- ablang
- transformers
- pytorch
- chemistry
- oas
- cdr
- ablang2 hf implementation
- roberta
- ESM
- ablang2
- antibody-design
# datasets:
# - oas
metrics:
- sequence modeling
- protein language model
library_name: transformers
pipeline_tag: fill-mask
---
# 🧬 AbLang2: Transformer-based Antibody Language Model
This repository provides HuggingFace-compatible πŸ€— implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the [Oxford Protein Informatics Group (OPIG)](https://opig.stats.ox.ac.uk/) and is available at:
- **AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2)
## 🎯 Model Available
- **ablang2**: AbLang2 model for antibody sequences
## πŸ“¦ Installation
Install the required dependencies:
```bash
# Install core dependencies
pip install transformers numpy pandas rotary-embedding-torch
# Install ANARCI from bioconda (required for antibody numbering)
conda install -c bioconda anarci
```
**Note**: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel.
## πŸš€ Loading Model from Hugging Face Hub
### Method 1: Load Model and Tokenizer, then Import Adapter
```python
import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
# Import and create the adapter
from adapter import AbLang2PairedHuggingFaceAdapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
```
### Method 2: Using importlib (Alternative)
```python
import importlib.util
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model and tokenizer
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# Load adapter dynamically
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
spec = importlib.util.spec_from_file_location("adapter", adapter_path)
adapter_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(adapter_module)
# Create the adapter
ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
```
**Note**: Model automatically use GPU when available, otherwise fall back to CPU.
## βš™οΈ Available Utilities
This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace.
- **seqcoding**: Sequence-level representations (averaged across residues)
- **rescoding**: Residue-level representations (per-residue embeddings)
- **likelihood**: Raw logits for amino acid prediction at each position
- **probability**: Normalized probabilities for amino acid prediction
- **pseudo_log_likelihood**: Uncertainty scoring with stepwise masking (masks each residue)
- **confidence**: Fast uncertainty scoring (single forward pass, no masking)
- **restore**: Restore masked residues (*) with predicted amino acids
All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation.
The `AbLang2PairedHuggingFaceAdapter` class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to:
- **Access all AbLang2 utilities** (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation
- **Work with antibody sequences** (heavy and light chains) seamlessly
- **Maintain compatibility** with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities
## πŸ’‘ Examples
### πŸ”— AbLang2 (Paired Sequences) - Restore Example
```python
import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# 1. Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# 2. Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
from adapter import AbLang2PairedHuggingFaceAdapter
# 3. Create adapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
# 4. Restore masked sequences
masked_seqs = [
['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS',
'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK']
]
restored = ablang(masked_seqs, mode='restore')
print(f"Restored sequences: {restored}")
```
## πŸ“š Detailed Usage
For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see:
- **[`test_ablang2_HF_implementation.ipynb`](test_ablang2_HF_implementation.ipynb)** - Complete notebook with all utilities and advanced usage patterns
## πŸ“– Citation
If you use these models in your research, please cite the original AbLang2 paper:
**AbLang2:**
```
@article{Olsen2024,
title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
journal={bioRxiv},
doi={https://doi.org/10.1101/2024.02.02.578678},
year={2024}
}
```