TrialChecker-0825

TrialChecker-0825 is a binary text classifier that estimates whether a given clinical trial “space” is a reasonable consideration for a patient, given the patient’s summary.
It is fine-tuned from [answerdotai/ModernBERT-large] for sequence classification on pairs of (trial space, patient summary).

Important: This is a research prototype for model development, not a medical device and not intended for clinical decision-making.


What counts as a “trial space”?

A trial space is a concise description of the target population a trial aims to enroll, focusing on:

  • Cancer type & histology
  • Burden of disease (curative vs metastatic)
  • Prior or excluded treatments
  • Required / excluded biomarkers

(Boilerplate exclusion rules—e.g., heart failure, uncontrolled brain mets—are not part of the trial space itself. They can be screened separately by OncoReasoning-3B or BoilerplateChecker-0825 or other logic.)


Training summary

The classifier was trained with a script that:

  1. Loads three sources of annotated patient–trial pairs:
    • Pairs originating from space-specific eligibility checks
    • “Patient→top-cohorts” checks (rounds 1–3)
    • “Trial-space→top patients” checks (rounds 1–3)
  2. Deduplicates by ['patient_summary', 'this_space']
  3. Builds the final text input as:

text = this\_space + "\nNow here is the patient summary:" + patient\_summary
  1. Uses eligibility_result as the binary label (0/1)
  2. Model is ModernBERT-large (sequence classification, 2 labels) at max_length 2048

Key hyperparameters from training

  • Base model: answerdotai/ModernBERT-large
  • Max length: 2048
  • Optimizer settings: learning_rate=2e-5, weight_decay=0.01
  • Batch size: per_device_train_batch_size=4
  • Epochs: 2
  • Save strategy: epoch
  • Tokenizer: AutoTokenizer.from_pretrained("answerdotai/ModernBERT-large")
  • Data collator: DataCollatorWithPadding

Intended use

  • Input: a string describing the trial space and a patient summary string
  • Output: probability that the trial is a reasonable consideration for that patient (not full eligibility)

Use cases:

  • Ranking candidate trial spaces for a patient
  • Early triage before detailed eligibility review (including boilerplate exclusions)

Out of scope:

  • Confirming formal eligibility or safety
  • Clinical decision support

Inference (Transformers)

Quick start (single example)

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

device = "cuda" if torch.cuda.is_available() else "cpu"
MODEL_REPO = "ksg-dfci/TrialChecker-0825" 

tok = AutoTokenizer.from_pretrained(MODEL_REPO)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_REPO).to(device)
model.eval()

this_space = (
 "Cancer type allowed: non-small cell lung cancer. "
 "Histology allowed: adenocarcinoma. "
 "Cancer burden allowed: metastatic disease. "
 "Prior treatment required: prior platinum-based chemo-immunotherapy allowed. "
 "Biomarkers required: ALK fusion."
)

patient_summary = (
 "Dx 2022 lung adenocarcinoma; metastatic to bone. Prior carbo/pem/pembro "
 "with best PR; ALK fusion detected by NGS. ECOG 1."
)

text = this_space + "\nNow here is the patient summary:" + patient_summary

enc = tok(text, return_tensors="pt", truncation=True, max_length=2048).to(device)
with torch.no_grad():
 logits = model(**enc).logits
probs = logits.softmax(-1).squeeze(0)

# Label mapping was set in training: {0: "NEGATIVE", 1: "POSITIVE"}
p_positive = float(probs[1])
print(f"Reasonable consideration probability: {p_positive:.3f}")

Batched scoring

from typing import List
import torch

def score_pairs(spaces: List[str], summaries: List[str], tokenizer, model, max_length=2048, batch_size=8):
    assert len(spaces) == len(summaries)
    device = next(model.parameters()).device
    scores = []

    for i in range(0, len(spaces), batch_size):
        batch_spaces = spaces[i:i+batch_size]
        batch_summaries = summaries[i:i+batch_size]
        texts = [s + "\nNow here is the patient summary:" + p for s, p in zip(batch_spaces, batch_summaries)]
        enc = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=max_length).to(device)
        with torch.no_grad():
            logits = model(**enc).logits
        probs = logits.softmax(-1)[:, 1]  # POSITIVE
        scores.extend(probs.detach().cpu().tolist())
    return scores

# Example
spaces = [this_space] * 3
summaries = [patient_summary, "Different summary 1...", "Different summary 2..."]
scores = score_pairs(spaces, summaries, tok, model)
print(scores)

Thresholding & calibration

  • Default decision: 0.5 on the POSITIVE probability.
  • For better calibration/operating points, tune the threshold on a validation set (e.g., maximize F1, optimize Youden’s J, or set to a desired precision).

How to prepare inputs

Trial space: a compact “target population” disease context description, including cancer type/histology, metastatic/curative, prior/forbidden treatments, required/excluded biomarkers. Patient summary: a concise longitudinal summary of diagnosis, histology, current burden, biomarkers, and treatment history.

You can generate these inputs with your upstream LLM pipeline (e.g., OncoReasoning-3B for summarization and space extraction), but the classifier accepts any plain strings in the format shown above.


Reproducibility (high-level)

Below is the minimal structure used by the training script to build the dataset before tokenization:

# 1) Load and merge three labeled sources
#    - space_specific_eligibility_checks.parquet
#    - top_ten_cohorts_checked_round{1,2,3}.csv
#    - top_twenty_patients_checked_round{1,2,3}.csv

# 2) Deduplicate by ['patient_summary','this_space'] and keep:
#    - split, patient_summary, this_space, eligibility_result

# 3) Compose input text and label:
text  = this_space + "\nNow here is the patient summary:" + patient_summary
label = int(eligibility_result)  # 0 or 1

# 4) Tokenize with ModernBERT tokenizer (max_length=2048, truncation=True)
# 5) Train AutoModelForSequenceClassification (2 labels)

To reproduce exactly, consult and run the original training script.


Limitations & ethical considerations

  • Outputs reflect training data and may contain biases or errors.
  • The model estimates reasonableness for consideration, not strict eligibility.
  • Not validated for safety-critical use; do not use for diagnosis or treatment decisions.

Citation

If you use this model or parts of the pipeline, please cite this model card and the training script (ModernBERT TrialChecker fine-tuning).


Downloads last month
5
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support