Overview

This model card provides detailed information about the BioLinkBERT-based version of BALI-BERT model pre-trained with a Graph Attention Network (GAT), a biomedical language representation model enhanced through knowledge graph and language model alignment. It is based on BERT architecture (specifically PubMedBERT and BioLinkBERT variants) and has been pre-trained using the BALI method to incorporate external knowledge from biomedical Knowledge Graphs (KG), such as UMLS.

The model demonstrates improved performance across several biomedical NLP tasks including Question Answering (QA), Entity Linking (EL), and Relation Extraction (RE). The BALI approach aligns textual representations with structured biomedical knowledge by simultaneously training a language model and a KG encoder.

Training Details

  • Text Corpus: Scientific abstracts from PubMed
  • Knowledge Graph: Unified Medical Language System (UMLS)
  • Training Size: 1.5M sentences from PubMed; 600K nodes from UMLS
  • Training Steps: ~65,000 steps
  • Batch Size: 256

πŸ“¦ How to Load the Model

pip install transformers
from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("andorei/BALI-BERT-BioLinkBERT-base-GNN")
model = BertModel.from_pretrained("andorei/BALI-BERT-BioLinkBERT-base-GNN")

πŸ“œ Citation

If you use this model in your work, please cite the original paper:

@inproceedings{Sakhovskiy2025BALI,
  author = {Sakhovskiy, Andrey and Tutubalina, Elena},
  title = {BALI: Enhancing Biomedical Language Representations through Knowledge Graph and Language Model Alignment},
  booktitle = {Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '25)},
  year = {2025}
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support