Model Card

This is DeBERTaV2 pre-trained language model from scratch on the DutchMedicalText corpus. A clinical model, continously-pretrained on UMCU clinical texts will uploaded later and linked here.

Model Details

About 410M parameters, with a 1024 token context length

Model Description

  • Developed by: Bram van Es - at UPOD UMCU/MEDxAI
  • Funded by : UMCU / Google TPU
  • Model type: DeBERTaV2
  • Language(s) (NLP): Dutch
  • License: GPL-3

Model Sources [optional]

Intended use

This model is directly suitable for Masked Language Modeling and can be finetuned for token/sequence classification, contrastive embeddings, relationship extraction and other downstream tasks.

Bias, Risks, and Limitations

This model was not filtered for bias. As for any language model, do not blindly accept the generated output. This is not a causal model, and it is not finetuned in any away for clinical decision support tasks.

Training Details

Training Data

Trained on about 80GB of Dutch medical texts, ranging van guidelines to patient cases reports.

Training Procedure

Preprocessing

  • Deidentification with DEDUCE
  • Removal of repetitive phrases and repetitive non-word characters
  • Cleaning with FTFY
  • Chunking of each document based on token counts. We do not

Training Hyperparameters

  • Training regime: bf16 mixed precision
  • learning rate: 1e-4 - 2e-4
  • number of warmup steps: 5000
  • steps per epoch: 50000
  • weight decacy: 0.001

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: TPUv4
  • Hours used: 400+
  • Cloud Provider: Google
  • Compute Region: US-WEST2 en EUROPE-WEST4
  • Carbon Emitted: 100kg+, compensated
Downloads last month
8
Safetensors
Model size
410M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for UMCU/CardioDeBERTa.nl

Finetuned
(6)
this model

Dataset used to train UMCU/CardioDeBERTa.nl

Space using UMCU/CardioDeBERTa.nl 1