RadLIT-ColBERT: Radiology Late Interaction Transformer

A ColBERT-style late interaction model trained for radiology document retrieval. RadLIT uses token-level MaxSim scoring to provide more nuanced relevance matching than pooled embeddings.

Model Description

RadLIT (Radiology Late Interaction Transformer) is a ColBERT-v2 style model adapted for radiology retrieval. Unlike traditional bi-encoders that produce single-vector representations, RadLIT maintains per-token embeddings and computes relevance through late interaction (MaxSim scoring).

Why Late Interaction?

Late interaction models offer advantages for medical terminology:

  • Precise term matching: Each query token finds its best-matching document token
  • Better handling of multi-word concepts: "hepatocellular carcinoma" tokens can independently match
  • Implicit term weighting: Important query terms contribute more to the final score

Architecture

  • Base Model: RoBERTa-base with ColBERT adapter
  • Hidden Size: 768
  • Output Dimension: 128 (compressed for efficiency)
  • Layers: 12
  • Attention Heads: 12
  • Parameters: ~125M
  • Max Sequence Length: 512 tokens

Training

The model was trained using the ColBERT framework with radiology-specific data:

  • Training Objective: InfoNCE with in-batch negatives + hard negatives
  • Hard Negative Mining: Top-100 BM25 negatives per query
  • Training Epochs: 4
  • Batch Size: 32

Note: Training data sources are not disclosed due to variable licensing.

Performance

RadLIT-9 Benchmark

Metric Score
MRR 0.750
nDCG@10 0.794
Recall@10 94.3%
Recall@5 89.0%
Recall@1 64.5%
Latency ~5ms

Subspecialty Performance

Subspecialty MRR Recall@10
Thoracic 0.958 98%
Pediatric 0.882 100%
Cardiac 0.754 98%
Breast 0.740 100%
Neuroradiology 0.729 90%
MSK 0.706 87%
Physics 0.699 93%
GI 0.686 94%
GU 0.578 90%

Comparison with Other Approaches

Model MRR Latency
RadLIT-ColBERT 0.750 5ms
RadLIT-BiEncoder 0.703 5ms
BM25 ~0.55 <1ms

Usage

Installation

pip install sentence-transformers colbert-ai

Basic Usage with Sentence Transformers

from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('matulichpt/radlit-colbert')

# Encode queries and documents
query = "What are the imaging features of hepatocellular carcinoma on MRI?"
documents = [
    "HCC typically shows arterial enhancement with washout...",
    "Breast cancer staging involves mammography and MRI..."
]

# Get embeddings (token-level for ColBERT)
query_emb = model.encode(query, convert_to_tensor=True)
doc_embs = [model.encode(d, convert_to_tensor=True) for d in documents]

# For ColBERT MaxSim, you need to compute token-level similarities
# See ColBERT documentation for proper MaxSim implementation

Late Interaction Scoring (MaxSim)

import torch

def maxsim_score(query_emb, doc_emb):
    """
    Compute MaxSim score between query and document embeddings.

    For each query token, find the maximum similarity with any document token,
    then sum these maximum similarities.
    """
    # query_emb: [num_query_tokens, dim]
    # doc_emb: [num_doc_tokens, dim]

    # Compute all pairwise similarities
    similarities = torch.matmul(query_emb, doc_emb.T)  # [q_tokens, d_tokens]

    # For each query token, take max similarity across all doc tokens
    max_sims = similarities.max(dim=1).values  # [q_tokens]

    # Sum all max similarities
    return max_sims.sum().item()

# Usage
query_emb = model.encode(query, convert_to_tensor=True, output_value='token_embeddings')
doc_emb = model.encode(document, convert_to_tensor=True, output_value='token_embeddings')
score = maxsim_score(query_emb, doc_emb)

Integration with RadLITE Pipeline

RadLIT-ColBERT is the first-stage retriever in the full RadLITE pipeline:

Query -> RadLIT-ColBERT (fast retrieval, top-50) -> CrossEncoder (reranking) -> Results

For best results, use the full RadLITE pipeline:

Evolution: RadLIT to RadLITE

Version Model MRR Innovation
v1.0 RadLIT-ColBERT (this model) 0.750 Late interaction
v1.5 RadLITx 0.782 + Cross-encoder fusion
v2.0 RadLITE 0.829 + Calibrated fusion

Intended Use

Primary Use Cases

  • Fast first-stage radiology retrieval
  • Educational content search
  • Medical imaging literature retrieval

Out-of-Scope Uses

  • Non-radiology content retrieval
  • Clinical diagnosis
  • Final relevance scoring (use CrossEncoder for that)

Limitations

  1. Subspecialty variance: Performance varies from 0.58 (GU) to 0.96 (Thoracic)
  2. Domain specificity: Optimized for radiology; limited generalization
  3. Late interaction overhead: Token-level storage increases index size

Ethical Considerations

  • Not a diagnostic tool
  • Should be used to surface relevant educational content
  • May reflect biases in radiology literature

Citation

@software{radlit_colbert_2026,
  title = {RadLIT-ColBERT: Late Interaction for Radiology Retrieval},
  author = {Grai Team},
  year = {2026},
  url = {https://huggingface.co/matulichpt/radlit-colbert},
  note = {MRR 0.750 on RadLIT-9 benchmark}
}

Related Models

License

Apache 2.0 - Free for research and commercial use.

Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results