---
id: CardioNER.nl_128xtokenWindow
name: CardioNER.nl_128xtokenWindow
description: >-
  CardioBERTa.nl_clinical finetuned for multilabel NER task with tokenwindow of
  128
license: gpl-3.0
language: nl
tags:
- lexical semantic
- span classification
- science
- biology
- clinical ner
- biomedical
- ner,medical
- bionlp
base_model: UMCU/CardioBERTa.nl_clinical
pipeline_tag: token-classification
datasets:
- DT4H/CardioCCC
- UMCU/cardioccc_dutch
---

# Model Card for Cardioner.nl 128 

This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
over sequences. This specific model is trained on a batch of about 500 span-labeled documents.

This is version was trained with context windows of 128 tokens. For the chunking we used a paragraph-based splitter.

The training was performed with 10 fold CV, with weight averaging of the best epochs per fold.


### Expected input and output
The input should be a string with **Dutch** clinical text related to **cardiology**. 

CardioNER.nl_128 is a multiclass span classification model.
The classes that can be predicted are 
* **procedure**,
* **medication**,
* **disease**,
* **symptom**.

#### Extracting span classification from CardioNER.nl_128xtokenWindow

The following script converts a string of <128 tokens to a list of span predictions.
```python
from transformers import pipeline

le_pipe = pipeline('ner',
                    model=model,
                    tokenizer=model, aggregation_strategy="simple",
                    device=-1)

named_ents = le_pipe(SOME_TEXT)
```

To process a string of *arbitrary length* you can split the string into sentences or paragraphs
using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
```python
named_ents = le_pipe(SOME_TEXT, stride=256)
```


# Data description

CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom


# Acknowledgement

This is part of the [DT4H project](https://www.datatools4heart.eu/).

# Doi and reference


For more details about training/eval and other scripts, see CardioNER [github repo](https://github.com/DataTools4Heart/CardioNER).
and for more information on the background, see Datatools4Heart [Huggingface](https://huggingface.co/DT4H)/[Website](https://www.datatools4heart.eu/)