BioBBC: BERT-BiLSTM-CRF for Biomarker Recognition
Model Description
BioBBC is an enhanced Named Entity Recognition (NER) model specifically trained for identifying biomarkers in biomedical text. It combines multiple state-of-the-art techniques:
- BERT: BioBERT for contextual embeddings
- Character CNN: Character-level features
- POS Embeddings: Part-of-speech information
- Domain Embeddings: Biomedical word embeddings
- BiLSTM: Bidirectional sequence modeling
- CRF: Conditional Random Fields for sequence labeling
Model Architecture
Input Text
β
BioBERT Embeddings (768d)
β
+ Character CNN (150d)
+ POS Embeddings (25d)
+ Domain Embeddings (200d)
β
BiLSTM (512d Γ 2)
β
CRF Layer
β
Predicted Biomarkers
Performance
- F1 Score: 0.9734
- Training Epochs: 5
- Best Epoch: 5
Labels
O: Outside (not a biomarker)B-BIOMARKER: Beginning of biomarker entityI-BIOMARKER: Inside biomarker entity
Usage
Installation
pip install transformers torch huggingface_hub
Load Model
from huggingface_hub import hf_hub_download
import torch
# Download model
model_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="config.json")
vocab_path = hf_hub_download(repo_id="postlyt/biobbc-biomarker-ner", filename="vocabularies.bin")
# Load model (see full loading script in repository)
# model = load_biobbc_model(model_path, config_path, vocab_path)
Predict Biomarkers
# See predict_from_huggingface.py in repository for complete example
text = "Elevated IL-6 and TNF-alpha levels were observed."
biomarkers = model.predict(text)
print(biomarkers) # ['IL-6', 'TNF-alpha']
Training Details
- Base Model: dmis-lab/biobert-base-cased-v1.2
- Training Data: Custom biomarker dataset
- Batch Size: 256 (optimized for A100 GPU)
- Mixed Precision: FP16 enabled
- Optimization: AdamW with linear warmup
Citation
If you use this model, please cite:
@misc{biobbc_model,
author = {postlyt},
title = {BioBBC: BERT-BiLSTM-CRF for Biomarker Recognition},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/postlyt/biobbc-biomarker-ner}
}
Model Card Authors
postlyt
Model Card Contact
For questions or issues, please open an issue in the repository.
- Downloads last month
- 16
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support