RadLIT-ColBERT: Radiology Late Interaction Transformer
A ColBERT-style late interaction model trained for radiology document retrieval. RadLIT uses token-level MaxSim scoring to provide more nuanced relevance matching than pooled embeddings.
Model Description
RadLIT (Radiology Late Interaction Transformer) is a ColBERT-v2 style model adapted for radiology retrieval. Unlike traditional bi-encoders that produce single-vector representations, RadLIT maintains per-token embeddings and computes relevance through late interaction (MaxSim scoring).
Why Late Interaction?
Late interaction models offer advantages for medical terminology:
- Precise term matching: Each query token finds its best-matching document token
- Better handling of multi-word concepts: "hepatocellular carcinoma" tokens can independently match
- Implicit term weighting: Important query terms contribute more to the final score
Architecture
- Base Model: RoBERTa-base with ColBERT adapter
- Hidden Size: 768
- Output Dimension: 128 (compressed for efficiency)
- Layers: 12
- Attention Heads: 12
- Parameters: ~125M
- Max Sequence Length: 512 tokens
Training
The model was trained using the ColBERT framework with radiology-specific data:
- Training Objective: InfoNCE with in-batch negatives + hard negatives
- Hard Negative Mining: Top-100 BM25 negatives per query
- Training Epochs: 4
- Batch Size: 32
Note: Training data sources are not disclosed due to variable licensing.
Performance
RadLIT-9 Benchmark
| Metric | Score |
|---|---|
| MRR | 0.750 |
| nDCG@10 | 0.794 |
| Recall@10 | 94.3% |
| Recall@5 | 89.0% |
| Recall@1 | 64.5% |
| Latency | ~5ms |
Subspecialty Performance
| Subspecialty | MRR | Recall@10 |
|---|---|---|
| Thoracic | 0.958 | 98% |
| Pediatric | 0.882 | 100% |
| Cardiac | 0.754 | 98% |
| Breast | 0.740 | 100% |
| Neuroradiology | 0.729 | 90% |
| MSK | 0.706 | 87% |
| Physics | 0.699 | 93% |
| GI | 0.686 | 94% |
| GU | 0.578 | 90% |
Comparison with Other Approaches
| Model | MRR | Latency |
|---|---|---|
| RadLIT-ColBERT | 0.750 | 5ms |
| RadLIT-BiEncoder | 0.703 | 5ms |
| BM25 | ~0.55 | <1ms |
Usage
Installation
pip install sentence-transformers colbert-ai
Basic Usage with Sentence Transformers
from sentence_transformers import SentenceTransformer
# Load model
model = SentenceTransformer('matulichpt/radlit-colbert')
# Encode queries and documents
query = "What are the imaging features of hepatocellular carcinoma on MRI?"
documents = [
"HCC typically shows arterial enhancement with washout...",
"Breast cancer staging involves mammography and MRI..."
]
# Get embeddings (token-level for ColBERT)
query_emb = model.encode(query, convert_to_tensor=True)
doc_embs = [model.encode(d, convert_to_tensor=True) for d in documents]
# For ColBERT MaxSim, you need to compute token-level similarities
# See ColBERT documentation for proper MaxSim implementation
Late Interaction Scoring (MaxSim)
import torch
def maxsim_score(query_emb, doc_emb):
"""
Compute MaxSim score between query and document embeddings.
For each query token, find the maximum similarity with any document token,
then sum these maximum similarities.
"""
# query_emb: [num_query_tokens, dim]
# doc_emb: [num_doc_tokens, dim]
# Compute all pairwise similarities
similarities = torch.matmul(query_emb, doc_emb.T) # [q_tokens, d_tokens]
# For each query token, take max similarity across all doc tokens
max_sims = similarities.max(dim=1).values # [q_tokens]
# Sum all max similarities
return max_sims.sum().item()
# Usage
query_emb = model.encode(query, convert_to_tensor=True, output_value='token_embeddings')
doc_emb = model.encode(document, convert_to_tensor=True, output_value='token_embeddings')
score = maxsim_score(query_emb, doc_emb)
Integration with RadLITE Pipeline
RadLIT-ColBERT is the first-stage retriever in the full RadLITE pipeline:
Query -> RadLIT-ColBERT (fast retrieval, top-50) -> CrossEncoder (reranking) -> Results
For best results, use the full RadLITE pipeline:
- RadLIT-BiEncoder - Dense retrieval alternative
- RadLIT-CrossEncoder - Reranking stage
Evolution: RadLIT to RadLITE
| Version | Model | MRR | Innovation |
|---|---|---|---|
| v1.0 | RadLIT-ColBERT (this model) | 0.750 | Late interaction |
| v1.5 | RadLITx | 0.782 | + Cross-encoder fusion |
| v2.0 | RadLITE | 0.829 | + Calibrated fusion |
Intended Use
Primary Use Cases
- Fast first-stage radiology retrieval
- Educational content search
- Medical imaging literature retrieval
Out-of-Scope Uses
- Non-radiology content retrieval
- Clinical diagnosis
- Final relevance scoring (use CrossEncoder for that)
Limitations
- Subspecialty variance: Performance varies from 0.58 (GU) to 0.96 (Thoracic)
- Domain specificity: Optimized for radiology; limited generalization
- Late interaction overhead: Token-level storage increases index size
Ethical Considerations
- Not a diagnostic tool
- Should be used to surface relevant educational content
- May reflect biases in radiology literature
Citation
@software{radlit_colbert_2026,
title = {RadLIT-ColBERT: Late Interaction for Radiology Retrieval},
author = {Grai Team},
year = {2026},
url = {https://huggingface.co/matulichpt/radlit-colbert},
note = {MRR 0.750 on RadLIT-9 benchmark}
}
Related Models
- RadLIT-BiEncoder - Dense retrieval (RadLITE v2.0)
- RadLIT-CrossEncoder - Reranking
License
Apache 2.0 - Free for research and commercial use.
- Downloads last month
- 4
Evaluation results
- MRR on RadLIT-9self-reported0.750
- Recall@10 on RadLIT-9self-reported0.943
- nDCG@10 on RadLIT-9self-reported0.794