DeAR-8B-Reranker-Listwise-v1

Model Description

DeAR-8B-Reranker-Listwise-v1 is an 8B parameter listwise neural reranker that generates document rankings through text generation. Unlike pointwise models that score documents independently, this model considers multiple documents simultaneously and produces rankings with Chain-of-Thought reasoning.

Model Details

Model Type: Listwise Reranker (Causal Language Model)
Base Model: LLaMA-3.1-8B
Parameters: 8 billion
Training Method: Supervised Fine-tuning with Chain-of-Thought
Training Data: DeAR-COT Dataset
Training Framework: LLaMA-Factory
Precision: BFloat16

Key Features

✅ Listwise Ranking: Considers inter-document dependencies
✅ Chain-of-Thought: Generates reasoning for ranking decisions
✅ State-of-the-Art: Best performance on NovelEval (90.97 NDCG@10)
✅ Flexible: Handles variable numbers of documents
✅ Interpretable: Provides explanations for rankings

Performance

Benchmark	NDCG@10	vs. GPT-4
TREC DL19	77.91	+2.32
TREC DL20	75.63	+5.07
NovelEval	90.97	+3.09
BEIR (Avg)	46.8	+2.3

Key Achievement: Outperforms GPT-4 on NovelEval by +3.09 points!

Usage

Quick Start

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model
model_path = "abdoelsayed/dear-8b-reranker-listwise-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Prepare input
query = "When did Thomas Edison invent the light bulb?"
documents = [
    "Lightning strike at Seoul National University",
    "Thomas Edison tried to invent a device for car but failed",
    "Coffee is good for diet",
    "KEPCO fixes light problems",
    "Thomas Edison invented the light bulb in 1879",
]

# Create listwise prompt
doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

# Generate ranking
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

ranking_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Ranking: {ranking_text}")
# Output: [4] > [1] > [0] > [3] > [2]

Complete Reranking Pipeline

import torch
from typing import List
from transformers import AutoTokenizer, AutoModelForCausalLM
import re

class ListwiseReranker:
    def __init__(self, model_path: str, device: str = "auto"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.bfloat16,
            device_map=device,
            low_cpu_mem_usage=True
        )
        
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
    
    def create_prompt(self, query: str, documents: List[str], max_doc_len: int = 300) -> str:
        """Create listwise ranking prompt."""
        doc_list = "\n".join([f"[{i}] {doc[:max_doc_len]}" for i, doc in enumerate(documents)])
        
        prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
        
        return prompt
    
    def parse_ranking(self, output_text: str, num_docs: int) -> List[int]:
        """Parse model output to extract ranking."""
        # Extract numbers from output
        numbers = re.findall(r'\[(\d+)\]', output_text)
        numbers = [int(n) for n in numbers if int(n) < num_docs]
        
        # Add missing documents at the end
        ranked = numbers.copy()
        for i in range(num_docs):
            if i not in ranked:
                ranked.append(i)
        
        return ranked[:num_docs]
    
    def rerank(
        self,
        query: str,
        documents: List[str],
        max_new_tokens: int = 50,
        temperature: float = 0.7
    ) -> List[int]:
        """
        Rerank documents for a query.
        
        Args:
            query: Search query
            documents: List of document texts
            max_new_tokens: Max tokens to generate
            temperature: Sampling temperature
        
        Returns:
            List of document indices ranked by relevance
        """
        prompt = self.create_prompt(query, documents)
        
        inputs = self.tokenizer(
            prompt,
            return_tensors="pt",
            truncation=True,
            max_length=2048
        )
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                temperature=temperature,
                do_sample=False,
                pad_token_id=self.tokenizer.pad_token_id
            )
        
        output_text = self.tokenizer.decode(
            outputs[0][inputs['input_ids'].shape[1]:],
            skip_special_tokens=True
        )
        
        ranking = self.parse_ranking(output_text, len(documents))
        return ranking


# Example usage
reranker = ListwiseReranker("abdoelsayed/dear-8b-reranker-listwise-v1")

query = "What are the health benefits of green tea?"
documents = [
    "Green tea is a popular beverage in Asian countries.",
    "Studies show green tea contains antioxidants that may reduce inflammation.",
    "Coffee is another caffeinated drink consumed worldwide.",
    "Green tea has been linked to improved brain function and fat loss.",
    "The weather today is sunny and warm.",
]

ranking = reranker.rerank(query, documents)
print(f"Ranked indices: {ranking}")
# Output: [1, 3, 0, 2, 4]

# Display ranked documents
for rank, idx in enumerate(ranking, 1):
    print(f"{rank}. {documents[idx]}")

Training Details

Training Data

Dataset: DeAR-COT
Format: Instruction-following with ranking outputs

Training Configuration

model_name: meta-llama/Llama-3.1-8B
task_type: sft
training_method: listwise_ranking
framework: LLaMA-Factory

hyperparameters:
  learning_rate: 1e-5
  batch_size: 4
  gradient_accumulation: 4
  epochs: 2
  max_length: 2048
  warmup_ratio: 0.1
  weight_decay: 0.01
  optimizer: adamw_torch
  lr_scheduler: cosine

distributed:
  method: torch.distributed.run
  num_gpus: 4
  deepspeed: zero2

Hardware

GPUs: 4x NVIDIA A100 (80GB)
Training Time: ~30 hours
Framework: LLaMA-Factory with DeepSpeed
Memory Usage: ~70GB per GPU

Prompt Format

Training Format:

I will provide you with {N} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

[0] {doc_0}
[1] {doc_1}
...
[N-1] {doc_N-1}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.

Answer: [most_relevant] > [second] > ... > [least_relevant]

Evaluation Results

TREC Deep Learning

Method	DL19 (NDCG@10)	DL20 (NDCG@10)	Average
BM25	50.58	47.96	49.27
RankGPT-4	75.59	70.56	73.08
DeAR-L-8B	77.91	75.63	76.77

NovelEval-2306 (Novel Query Generalization)

Method	NDCG@1	NDCG@5	NDCG@10	Average
BM25	33.33	45.96	55.77	45.02
RankGPT-4	85.71	87.49	90.45	87.88
DeAR-L-8B	92.86	88.04	92.01	90.97

🏆 +3.09 points better than GPT-4 on NovelEval!

BEIR Benchmark

Dataset	NDCG@10
MS MARCO	70.2
NQ	54.1
HotpotQA	64.5
FiQA	49.3
ArguAna	62.1
SciFact	76.2
TREC-COVID	88.4
NFCorpus	40.6
Average	46.8

Efficiency Analysis

Metric	Value
Inference Time (20 docs)	11.16s
Throughput	~1.8 docs/sec
GPU Memory (inference)	22GB
Model Size (BF16)	16GB

Comparison with Other Methods:

2.2x faster than RankGPT-4 (24.5s)
1.9x faster than RankZephyr (21.6s)
Similar performance with much better efficiency

Advantages over Pointwise Models

Aspect	Pointwise	Listwise (This Model)
Document Interaction	❌ Independent	✅ Considers relationships
Reasoning	❌ None	✅ Chain-of-Thought
Novel Queries	Good	✅ Excellent (+3-5 NDCG@10)
Interpretability	❌ Score only	✅ Reasoning provided
Speed	✅ Very Fast (2.2s)	Moderate (11.2s)

Model Architecture

Input: Listwise Prompt with Query + Multiple Documents
    ↓
LLaMA-3.1-8B Decoder
    ↓
Auto-regressive Generation
    ↓
Output: "[4] > [1] > [0] > [3] > [2]"
    ↓
Parse to Ranking: [4, 1, 0, 3, 2]

When to Use This Model

Best for:

✅ Novel/complex queries requiring reasoning
✅ Tasks where interpretability matters
✅ Small candidate sets (<100 documents)
✅ Research and analysis applications

Consider pointwise models for:

❌ Large-scale reranking (1000s of docs)
❌ Real-time, low-latency applications
❌ When reasoning is not needed

Limitations

Inference Speed: Slower than pointwise models (~5x)
Document Count: Limited by context length (~20-50 docs optimal)
Parsing Errors: May occasionally generate malformed rankings
Cost: Higher computational cost for generation
Language: English only

Bias and Ethical Considerations

Position Bias: May favor documents in certain positions
Training Data Bias: Inherits biases from CoT annotations
Reasoning Artifacts: Generated explanations may contain hallucinations
Fairness: Should be evaluated for fairness in your domain

Related Models

DeAR Listwise:

DeAR-8B-Listwise-LoRA - LoRA adapter version

DeAR Pointwise (8B):

Resources:

Citation

@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}

License

MIT License

More Information

GitHub: DataScienceUIBK/DeAR-Reranking
Paper: arXiv:2508.16998
Collection: DeAR Models

Downloads last month: 4

Model tree for abdoelsayed/dear-8b-reranker-listwise-v1

Base model

meta-llama/Llama-3.1-8B

Finetuned

(1646)

this model

Dataset used to train abdoelsayed/dear-8b-reranker-listwise-v1

Collection including abdoelsayed/dear-8b-reranker-listwise-v1

DeAR-Reranking

Collection

DeAR (Deep Agent Rank): Dual-Stage Document Reranking with Reasoning Agents Accepted at EMNLP Findings 2025 • 12 items • Updated Oct 21 • 1