Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

🧬 AbLang2: Transformer-based Antibody Language Model

This repository provides HuggingFace-compatible πŸ€— implementation of the AbLang2 language model for antibodies. The original AbLang2 model was developed by the Oxford Protein Informatics Group (OPIG) and is available at:

🎯 Model Available

  • ablang2: AbLang2 model for antibody sequences

πŸ“¦ Installation

Install the required dependencies:

# Install core dependencies
pip install transformers numpy pandas rotary-embedding-torch

# Install ANARCI from bioconda (required for antibody numbering)
conda install -c bioconda anarci

Note: ANARCI is required for antibody sequence numbering and alignment features. It must be installed from the bioconda channel.

πŸš€ Loading Model from Hugging Face Hub

Method 1: Load Model and Tokenizer, then Import Adapter

import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download

# Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)

# Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)

# Import and create the adapter
from adapter import AbLang2PairedHuggingFaceAdapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)

Method 2: Using importlib (Alternative)

import importlib.util
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download

# Load model and tokenizer
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)

# Load adapter dynamically
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
spec = importlib.util.spec_from_file_location("adapter", adapter_path)
adapter_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(adapter_module)

# Create the adapter
ablang = adapter_module.AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)

Note: Model automatically use GPU when available, otherwise fall back to CPU.

βš™οΈ Available Utilities

This wrapper translates between HuggingFace's model format and AbLang2's expected input/output structure, making it easy to use AbLang2's powerful antibody analysis tools with model loaded from HuggingFace.

  • seqcoding: Sequence-level representations (averaged across residues)
  • rescoding: Residue-level representations (per-residue embeddings)
  • likelihood: Raw logits for amino acid prediction at each position
  • probability: Normalized probabilities for amino acid prediction
  • pseudo_log_likelihood: Uncertainty scoring with stepwise masking (masks each residue)
  • confidence: Fast uncertainty scoring (single forward pass, no masking)
  • restore: Restore masked residues (*) with predicted amino acids

All these utilities work seamlessly with the HuggingFace-loaded model, maintaining the same API as the original AbLang2 implementation.

The AbLang2PairedHuggingFaceAdapter class is a wrapper that lets you use AbLang2 model utilities after loading the model from HuggingFace. This class enables you to:

  • Access all AbLang2 utilities (seqcoding, rescoding, likelihood, probability, etc.) with the same interface as the original implementation
  • Work with antibody sequences (heavy and light chains) seamlessly
  • Maintain compatibility with the original AbLang2 API while leveraging HuggingFace's model loading and caching capabilities

πŸ’‘ Examples

πŸ”— AbLang2 (Paired Sequences) - Restore Example

import sys
import os
from transformers import AutoModel, AutoTokenizer
from huggingface_hub import hf_hub_download

# 1. Load model and tokenizer from Hugging Face Hub
model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)

# 2. Download adapter and add to path
adapter_path = hf_hub_download(repo_id="hemantn/ablang2", filename="adapter.py")
cached_model_dir = os.path.dirname(adapter_path)
sys.path.insert(0, cached_model_dir)
from adapter import AbLang2PairedHuggingFaceAdapter

# 3. Create adapter
ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)

# 4. Restore masked sequences
masked_seqs = [
    ['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS',
     'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK']
]
restored = ablang(masked_seqs, mode='restore')
print(f"Restored sequences: {restored}")

πŸ“š Detailed Usage

For comprehensive examples of all utilities (seqcoding, rescoding, likelihood, probability, pseudo_log_likelihood, confidence, and more), see:

πŸ“– Citation

If you use these models in your research, please cite the original AbLang2 paper:

AbLang2:

@article{Olsen2024,
  title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
  author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
  journal={bioRxiv},
  doi={https://doi.org/10.1101/2024.02.02.578678},
  year={2024}
}
Downloads last month
759
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using hemantn/ablang2 1