|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- dmis-lab/biobert-base-cased-v1.1 |
|
pipeline_tag: text-classification |
|
tags: |
|
- medical |
|
--- |
|
|
|
# BioBERT Research Insights |
|
|
|
This model is a fine-tuned [BioBERT](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) on the PubMed 20k RCT dataset. It classifies sentences from biomedical abstracts into one of five categories: |
|
|
|
- BACKGROUND |
|
- OBJECTIVE |
|
- METHODS |
|
- RESULTS |
|
- CONCLUSIONS |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
classifier = pipeline("text-classification", model="SubhaL/biobert-research-insights") |
|
|
|
example = "The trial demonstrated significant improvement in patient survival rates." |
|
result = classifier(example) |
|
|
|
print(result) |
|
``` |
|
|
|
## Evaluation Metrics |
|
|
|
The model was evaluated on the PubMed 20k RCT test dataset, which contains 5 sentence classes: |
|
|
|
- 0: BACKGROUND |
|
- 1: OBJECTIVE |
|
- 2: METHODS |
|
- 3: RESULTS |
|
- 4: CONCLUSIONS |
|
|
|
| Metric | Score | |
|
|----------------------|--------| |
|
| Accuracy | 86.6% | |
|
| Precision (weighted) | 86.7% | |
|
| Recall (weighted) | 86.6% | |
|
| F1-score (weighted) | 86.6% | |
|
|
|
### Class-wise performance highlights: |
|
|
|
- **METHODS** and **RESULTS** classes achieve high precision and recall (~93-94%), indicating strong performance in identifying these sections. |
|
- Lower scores on **BACKGROUND** and **OBJECTIVE** suggest these categories are more challenging to distinguish, likely due to overlapping language. |