RadLIT-ColBERT: Radiology Late Interaction Transformer

A ColBERT-style late interaction model trained for radiology document retrieval. RadLIT uses token-level MaxSim scoring to provide more nuanced relevance matching than pooled embeddings.

Model Description

RadLIT (Radiology Late Interaction Transformer) is a ColBERT-v2 style model adapted for radiology retrieval. Unlike traditional bi-encoders that produce single-vector representations, RadLIT maintains per-token embeddings and computes relevance through late interaction (MaxSim scoring).

Why Late Interaction?

Late interaction models offer advantages for medical terminology:

Precise term matching: Each query token finds its best-matching document token
Better handling of multi-word concepts: "hepatocellular carcinoma" tokens can independently match
Implicit term weighting: Important query terms contribute more to the final score

Architecture

Base Model: RoBERTa-base with ColBERT adapter
Hidden Size: 768
Output Dimension: 128 (compressed for efficiency)
Layers: 12
Attention Heads: 12
Parameters: ~125M
Max Sequence Length: 512 tokens

Training

The model was trained using the ColBERT framework with radiology-specific data:

Training Objective: InfoNCE with in-batch negatives + hard negatives
Hard Negative Mining: Top-100 BM25 negatives per query
Training Epochs: 4
Batch Size: 32

Note: Training data sources are not disclosed due to variable licensing.

Performance

RadLIT-9 Benchmark

Metric	Score
MRR	0.750
nDCG@10	0.794
Recall@10	94.3%
Recall@5	89.0%
Recall@1	64.5%
Latency	~5ms

Subspecialty Performance

Subspecialty	MRR	Recall@10
Thoracic	0.958	98%
Pediatric	0.882	100%
Cardiac	0.754	98%
Breast	0.740	100%
Neuroradiology	0.729	90%
MSK	0.706	87%
Physics	0.699	93%
GI	0.686	94%
GU	0.578	90%

Comparison with Other Approaches

Model	MRR	Latency
RadLIT-ColBERT	0.750	5ms
RadLIT-BiEncoder	0.703	5ms
BM25	~0.55	<1ms

Usage

Installation

pip install sentence-transformers colbert-ai

Basic Usage with Sentence Transformers

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-colbert')

# Encode queries and documents
query = "What are the imaging features of hepatocellular carcinoma on MRI?"
documents = [
    "HCC typically shows arterial enhancement with washout...",
    "Breast cancer staging involves mammography and MRI..."
]

# Get embeddings (token-level for ColBERT)
query_emb = model.encode(query, convert_to_tensor=True)
doc_embs = [model.encode(d, convert_to_tensor=True) for d in documents]

# For ColBERT MaxSim, you need to compute token-level similarities
# See ColBERT documentation for proper MaxSim implementation

Late Interaction Scoring (MaxSim)

import torch

def maxsim_score(query_emb, doc_emb):
    """
    Compute MaxSim score between query and document embeddings.

    For each query token, find the maximum similarity with any document token,
    then sum these maximum similarities.
    """
    # query_emb: [num_query_tokens, dim]
    # doc_emb: [num_doc_tokens, dim]

    # Compute all pairwise similarities
    similarities = torch.matmul(query_emb, doc_emb.T)  # [q_tokens, d_tokens]

    # For each query token, take max similarity across all doc tokens
    max_sims = similarities.max(dim=1).values  # [q_tokens]

    # Sum all max similarities
    return max_sims.sum().item()

# Usage
query_emb = model.encode(query, convert_to_tensor=True, output_value='token_embeddings')
doc_emb = model.encode(document, convert_to_tensor=True, output_value='token_embeddings')
score = maxsim_score(query_emb, doc_emb)

Integration with RadLITE Pipeline

RadLIT-ColBERT is the first-stage retriever in the full RadLITE pipeline:

Query -> RadLIT-ColBERT (fast retrieval, top-50) -> CrossEncoder (reranking) -> Results

For best results, use the full RadLITE pipeline:

RadLIT-BiEncoder - Dense retrieval alternative
RadLIT-CrossEncoder - Reranking stage

Evolution: RadLIT to RadLITE

Version	Model	MRR	Innovation
v1.0	RadLIT-ColBERT (this model)	0.750	Late interaction
v1.5	RadLITx	0.782	+ Cross-encoder fusion
v2.0	RadLITE	0.829	+ Calibrated fusion

Intended Use

Primary Use Cases

Fast first-stage radiology retrieval
Educational content search
Medical imaging literature retrieval

Out-of-Scope Uses

Non-radiology content retrieval
Clinical diagnosis
Final relevance scoring (use CrossEncoder for that)

Limitations

Subspecialty variance: Performance varies from 0.58 (GU) to 0.96 (Thoracic)
Domain specificity: Optimized for radiology; limited generalization
Late interaction overhead: Token-level storage increases index size

Ethical Considerations

Not a diagnostic tool
Should be used to surface relevant educational content
May reflect biases in radiology literature

Citation

@software{radlit_colbert_2026,
  title = {RadLIT-ColBERT: Late Interaction for Radiology Retrieval},
  author = {Grai Team},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-colbert},
  note = {MRR 0.750 on RadLIT-9 benchmark}
}

Related Models

RadLIT-BiEncoder - Dense retrieval (RadLITE v2.0)
RadLIT-CrossEncoder - Reranking

License

Apache 2.0 - Free for research and commercial use.

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

Evaluation results

MRR on RadLIT-9
self-reported

0.750
Recall@10 on RadLIT-9
self-reported

0.943
nDCG@10 on RadLIT-9
self-reported

0.794