DeAR-8B-Reranker-Listwise-v1

Model Description

DeAR-8B-Reranker-Listwise-v1 is an 8B parameter listwise neural reranker that generates document rankings through text generation. Unlike pointwise models that score documents independently, this model considers multiple documents simultaneously and produces rankings with Chain-of-Thought reasoning.

Model Details

  • Model Type: Listwise Reranker (Causal Language Model)
  • Base Model: LLaMA-3.1-8B
  • Parameters: 8 billion
  • Training Method: Supervised Fine-tuning with Chain-of-Thought
  • Training Data: DeAR-COT Dataset
  • Training Framework: LLaMA-Factory
  • Precision: BFloat16

Key Features

Listwise Ranking: Considers inter-document dependencies
Chain-of-Thought: Generates reasoning for ranking decisions
State-of-the-Art: Best performance on NovelEval (90.97 NDCG@10)
Flexible: Handles variable numbers of documents
Interpretable: Provides explanations for rankings

Performance

Benchmark NDCG@10 vs. GPT-4
TREC DL19 77.91 +2.32
TREC DL20 75.63 +5.07
NovelEval 90.97 +3.09
BEIR (Avg) 46.8 +2.3

Key Achievement: Outperforms GPT-4 on NovelEval by +3.09 points!

Usage

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model
model_path = "abdoelsayed/dear-8b-reranker-listwise-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Prepare input
query = "When did Thomas Edison invent the light bulb?"
documents = [
    "Lightning strike at Seoul National University",
    "Thomas Edison tried to invent a device for car but failed",
    "Coffee is good for diet",
    "KEPCO fixes light problems",
    "Thomas Edison invented the light bulb in 1879",
]

# Create listwise prompt
doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

# Generate ranking
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

ranking_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Ranking: {ranking_text}")
# Output: [4] > [1] > [0] > [3] > [2]

Complete Reranking Pipeline

import torch
from typing import List
from transformers import AutoTokenizer, AutoModelForCausalLM
import re

class ListwiseReranker:
    def __init__(self, model_path: str, device: str = "auto"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.bfloat16,
            device_map=device,
            low_cpu_mem_usage=True
        )
        
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
    
    def create_prompt(self, query: str, documents: List[str], max_doc_len: int = 300) -> str:
        """Create listwise ranking prompt."""
        doc_list = "\n".join([f"[{i}] {doc[:max_doc_len]}" for i, doc in enumerate(documents)])
        
        prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
        
        return prompt
    
    def parse_ranking(self, output_text: str, num_docs: int) -> List[int]:
        """Parse model output to extract ranking."""
        # Extract numbers from output
        numbers = re.findall(r'\[(\d+)\]', output_text)
        numbers = [int(n) for n in numbers if int(n) < num_docs]
        
        # Add missing documents at the end
        ranked = numbers.copy()
        for i in range(num_docs):
            if i not in ranked:
                ranked.append(i)
        
        return ranked[:num_docs]
    
    def rerank(
        self,
        query: str,
        documents: List[str],
        max_new_tokens: int = 50,
        temperature: float = 0.7
    ) -> List[int]:
        """
        Rerank documents for a query.
        
        Args:
            query: Search query
            documents: List of document texts
            max_new_tokens: Max tokens to generate
            temperature: Sampling temperature
        
        Returns:
            List of document indices ranked by relevance
        """
        prompt = self.create_prompt(query, documents)
        
        inputs = self.tokenizer(
            prompt,
            return_tensors="pt",
            truncation=True,
            max_length=2048
        )
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                temperature=temperature,
                do_sample=False,
                pad_token_id=self.tokenizer.pad_token_id
            )
        
        output_text = self.tokenizer.decode(
            outputs[0][inputs['input_ids'].shape[1]:],
            skip_special_tokens=True
        )
        
        ranking = self.parse_ranking(output_text, len(documents))
        return ranking


# Example usage
reranker = ListwiseReranker("abdoelsayed/dear-8b-reranker-listwise-v1")

query = "What are the health benefits of green tea?"
documents = [
    "Green tea is a popular beverage in Asian countries.",
    "Studies show green tea contains antioxidants that may reduce inflammation.",
    "Coffee is another caffeinated drink consumed worldwide.",
    "Green tea has been linked to improved brain function and fat loss.",
    "The weather today is sunny and warm.",
]

ranking = reranker.rerank(query, documents)
print(f"Ranked indices: {ranking}")
# Output: [1, 3, 0, 2, 4]

# Display ranked documents
for rank, idx in enumerate(ranking, 1):
    print(f"{rank}. {documents[idx]}")

Training Details

Training Data

  • Dataset: DeAR-COT
  • Format: Instruction-following with ranking outputs

Training Configuration

model_name: meta-llama/Llama-3.1-8B
task_type: sft
training_method: listwise_ranking
framework: LLaMA-Factory

hyperparameters:
  learning_rate: 1e-5
  batch_size: 4
  gradient_accumulation: 4
  epochs: 2
  max_length: 2048
  warmup_ratio: 0.1
  weight_decay: 0.01
  optimizer: adamw_torch
  lr_scheduler: cosine

distributed:
  method: torch.distributed.run
  num_gpus: 4
  deepspeed: zero2

Hardware

  • GPUs: 4x NVIDIA A100 (80GB)
  • Training Time: ~30 hours
  • Framework: LLaMA-Factory with DeepSpeed
  • Memory Usage: ~70GB per GPU

Prompt Format

Training Format:

I will provide you with {N} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

[0] {doc_0}
[1] {doc_1}
...
[N-1] {doc_N-1}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.

Answer: [most_relevant] > [second] > ... > [least_relevant]

Evaluation Results

TREC Deep Learning

Method DL19 (NDCG@10) DL20 (NDCG@10) Average
BM25 50.58 47.96 49.27
RankGPT-4 75.59 70.56 73.08
DeAR-L-8B 77.91 75.63 76.77

NovelEval-2306 (Novel Query Generalization)

Method NDCG@1 NDCG@5 NDCG@10 Average
BM25 33.33 45.96 55.77 45.02
RankGPT-4 85.71 87.49 90.45 87.88
DeAR-L-8B 92.86 88.04 92.01 90.97

🏆 +3.09 points better than GPT-4 on NovelEval!

BEIR Benchmark

Dataset NDCG@10
MS MARCO 70.2
NQ 54.1
HotpotQA 64.5
FiQA 49.3
ArguAna 62.1
SciFact 76.2
TREC-COVID 88.4
NFCorpus 40.6
Average 46.8

Efficiency Analysis

Metric Value
Inference Time (20 docs) 11.16s
Throughput ~1.8 docs/sec
GPU Memory (inference) 22GB
Model Size (BF16) 16GB

Comparison with Other Methods:

  • 2.2x faster than RankGPT-4 (24.5s)
  • 1.9x faster than RankZephyr (21.6s)
  • Similar performance with much better efficiency

Advantages over Pointwise Models

Aspect Pointwise Listwise (This Model)
Document Interaction ❌ Independent ✅ Considers relationships
Reasoning ❌ None ✅ Chain-of-Thought
Novel Queries Good Excellent (+3-5 NDCG@10)
Interpretability ❌ Score only ✅ Reasoning provided
Speed ✅ Very Fast (2.2s) Moderate (11.2s)

Model Architecture

Input: Listwise Prompt with Query + Multiple Documents
    ↓
LLaMA-3.1-8B Decoder
    ↓
Auto-regressive Generation
    ↓
Output: "[4] > [1] > [0] > [3] > [2]"
    ↓
Parse to Ranking: [4, 1, 0, 3, 2]

When to Use This Model

Best for:

  • ✅ Novel/complex queries requiring reasoning
  • ✅ Tasks where interpretability matters
  • ✅ Small candidate sets (<100 documents)
  • ✅ Research and analysis applications

Consider pointwise models for:

  • ❌ Large-scale reranking (1000s of docs)
  • ❌ Real-time, low-latency applications
  • ❌ When reasoning is not needed

Limitations

  1. Inference Speed: Slower than pointwise models (~5x)
  2. Document Count: Limited by context length (~20-50 docs optimal)
  3. Parsing Errors: May occasionally generate malformed rankings
  4. Cost: Higher computational cost for generation
  5. Language: English only

Bias and Ethical Considerations

  • Position Bias: May favor documents in certain positions
  • Training Data Bias: Inherits biases from CoT annotations
  • Reasoning Artifacts: Generated explanations may contain hallucinations
  • Fairness: Should be evaluated for fairness in your domain

Related Models

DeAR Listwise:

DeAR Pointwise (8B):

Resources:

Citation

@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}

License

MIT License

More Information

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdoelsayed/dear-8b-reranker-listwise-v1

Finetuned
(1646)
this model

Dataset used to train abdoelsayed/dear-8b-reranker-listwise-v1

Collection including abdoelsayed/dear-8b-reranker-listwise-v1