File size: 6,162 Bytes
bb74251 d873a76 bb74251 5a9ba24 bb74251 5a9ba24 e1df3c0 bb74251 3d3c39b bb74251 3d3c39b bb74251 5a9ba24 bb74251 3d3c39b bb74251 3d3c39b 5a9ba24 3d3c39b bb74251 3d3c39b 5a9ba24 3d3c39b 5a9ba24 3d3c39b e1df3c0 bb74251 e1df3c0 bb74251 e1df3c0 bb74251 3d3c39b bb74251 3d3c39b bb74251 5a9ba24 3d3c39b 5a9ba24 3d3c39b bb74251 3d3c39b bb74251 3d3c39b bb74251 3d3c39b bb74251 3d3c39b bb74251 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
#language:
#- en
license: mit
tags:
- biology
- protein
- antibody
- ablang
- transformers
- pytorch
- chemistry
- oas
- cdr
- ablang2 hf implementation
- roberta
- ESM
- ablang2
- antibody-design
# datasets:
# - oas
metrics:
- sequence modeling
- protein language model
library_name: transformers
pipeline_tag: fill-mask
---
# 𧬠AbLang2: Transformer-based Antibody Language Model
This repository provides HuggingFace-compatible π€ implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the [Oxford Protein Informatics Group (OPIG)](https://opig.stats.ox.ac.uk/) and is available at:
- **AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2)
## π― Model Available
- **ablang2**: AbLang2 model for antibody sequences
## π¦ Installation
Install the required dependencies:
```bash
# Install core dependencies
pip install transformers numpy pandas rotary-embedding-torch
# Install ANARCI from bioconda (required for antibody numbering)
conda install -c bioconda anarci
```
**Note**: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel.
## π Loading Model from Hugging Face Hub
### Method 1: Load Model and Tokenizer, then Import Adapter
```python
import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
# Import and create the adapter
from adapter import AbLang2PairedHuggingFaceAdapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
```
### Method 2: Using importlib (Alternative)
```python
import importlib.util
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# Load model and tokenizer
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# Load adapter dynamically
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
spec = importlib.util.spec_from_file_location("adapter", adapter_path)
adapter_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(adapter_module)
# Create the adapter
ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
```
**Note**: Model automatically use GPU when available, otherwise fall back to CPU.
## βοΈ Available Utilities
This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace.
- **seqcoding**: Sequence-level representations (averaged across residues)
- **rescoding**: Residue-level representations (per-residue embeddings)
- **likelihood**: Raw logits for amino acid prediction at each position
- **probability**: Normalized probabilities for amino acid prediction
- **pseudo_log_likelihood**: Uncertainty scoring with stepwise masking (masks each residue)
- **confidence**: Fast uncertainty scoring (single forward pass, no masking)
- **restore**: Restore masked residues (*) with predicted amino acids
All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation.
The `AbLang2PairedHuggingFaceAdapter` class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to:
- **Access all AbLang2 utilities** (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation
- **Work with antibody sequences** (heavy and light chains) seamlessly
- **Maintain compatibility** with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities
## π‘ Examples
### π AbLang2 (Paired Sequences) - Restore Example
```python
import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download
# 1. Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
# 2. Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
from adapter import AbLang2PairedHuggingFaceAdapter
# 3. Create adapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
# 4. Restore masked sequences
masked_seqs = [
['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS',
'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK']
]
restored = ablang(masked_seqs, mode='restore')
print(f"Restored sequences: {restored}")
```
## π Detailed Usage
For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see:
- **[`test_ablang2_HF_implementation.ipynb`](test_ablang2_HF_implementation.ipynb)** - Complete notebook with all utilities and advanced usage patterns
## π Citation
If you use these models in your research, please cite the original AbLang2 paper:
**AbLang2:**
```
@article{Olsen2024,
title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
journal={bioRxiv},
doi={https://doi.org/10.1101/2024.02.02.578678},
year={2024}
}
``` |