File size: 1,474 Bytes
6bbf090
 
 
 
 
 
 
 
 
86d5b2a
 
6bbf090
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86d5b2a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
language:
- en
metrics:
- accuracy
base_model:
- dmis-lab/biobert-base-cased-v1.1
pipeline_tag: text-classification
tags:
- medical
---

# BioBERT Research Insights

This model is a fine-tuned [BioBERT](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) on the PubMed 20k RCT dataset. It classifies sentences from biomedical abstracts into one of five categories:

- BACKGROUND
- OBJECTIVE
- METHODS
- RESULTS
- CONCLUSIONS

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="SubhaL/biobert-research-insights")

example = "The trial demonstrated significant improvement in patient survival rates."
result = classifier(example)

print(result)
```

## Evaluation Metrics

The model was evaluated on the PubMed 20k RCT test dataset, which contains 5 sentence classes:

- 0: BACKGROUND  
- 1: OBJECTIVE  
- 2: METHODS  
- 3: RESULTS  
- 4: CONCLUSIONS  

| Metric               | Score  |
|----------------------|--------|
| Accuracy             | 86.6%  |
| Precision (weighted) | 86.7%  |
| Recall (weighted)    | 86.6%  |
| F1-score (weighted)  | 86.6%  |

### Class-wise performance highlights:

- **METHODS** and **RESULTS** classes achieve high precision and recall (~93-94%), indicating strong performance in identifying these sections.
- Lower scores on **BACKGROUND** and **OBJECTIVE** suggest these categories are more challenging to distinguish, likely due to overlapping language.