BioBBC: BERT-BiLSTM-CRF for Biomarker Recognition

Model Description

BioBBC is an enhanced Named Entity Recognition (NER) model specifically trained for identifying biomarkers in biomedical text. It combines multiple state-of-the-art techniques:

  • BERT: BioBERT for contextual embeddings
  • Character CNN: Character-level features
  • POS Embeddings: Part-of-speech information
  • Domain Embeddings: Biomedical word embeddings
  • BiLSTM: Bidirectional sequence modeling
  • CRF: Conditional Random Fields for sequence labeling

Model Architecture

Input Text
    ↓
BioBERT Embeddings (768d)
    ↓
+ Character CNN (150d)
+ POS Embeddings (25d)
+ Domain Embeddings (200d)
    ↓
BiLSTM (512d Γ— 2)
    ↓
CRF Layer
    ↓
Predicted Biomarkers

Performance

  • F1 Score: 0.9734
  • Training Epochs: 5
  • Best Epoch: 5

Labels

  • O: Outside (not a biomarker)
  • B-BIOMARKER: Beginning of biomarker entity
  • I-BIOMARKER: Inside biomarker entity

Usage

Installation

pip install transformers torch huggingface_hub

Load Model

from huggingface_hub import hf_hub_download
import torch

# Download model
model_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="config.json")
vocab_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="vocabularies.bin")

# Load model (see full loading script in repository)
# model = load_biobbc_model(model_path, config_path, vocab_path)

Predict Biomarkers

# See predict_from_huggingface.py in repository for complete example
text = "Elevated IL-6 and TNF-alpha levels were observed."
biomarkers = model.predict(text)
print(biomarkers)  # ['IL-6', 'TNF-alpha']

Training Details

  • Base Model: dmis-lab/biobert-base-cased-v1.2
  • Training Data: Custom biomarker dataset
  • Batch Size: 256 (optimized for A100 GPU)
  • Mixed Precision: FP16 enabled
  • Optimization: AdamW with linear warmup

Citation

If you use this model, please cite:

@misc{biobbc_model,
  author = {postlyt},
  title = {BioBBC: BERT-BiLSTM-CRF for Biomarker Recognition},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/postlyt/biobbc-biomarker-ner}
}

Model Card Authors

postlyt

Model Card Contact

For questions or issues, please open an issue in the repository.

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support