BlockRank-Mistral-7B: Scalable In-context Ranking with Generative Models

Open In Colab

BlockRank-Mistral-7B is a fine-tuned version of Mistral-7B-Instruct-v0.3 optimized for efficient in-context document ranking. It implements BlockRank, a method that makes LLMs efficient and scalable for ranking by aligning their internal attention mechanisms with the structure of the ranking task.

BlockRank Architecture

Key Features

  • Linear Complexity Attention: Structured sparse attention reduces complexity from O(n²) to O(n)
  • 2-4× Faster Inference: Attention-based scoring eliminates autoregressive decoding
  • Auxiliary Contrastive Loss: Mid-layer contrastive objective improves relevance signals
  • Strong Zero-shot Generalization: SOTA performance on BEIR benchmarks

Citation

If you use this model, please cite:

@article{gupta2025blockrank,
  title={Scalable In-context Ranking with Generative Models},
  author={Gupta, Nilesh and You, Chong and Bhojanapalli, Srinadh and Kumar, Sanjiv and Dhillon, Inderjit and Yu, Felix},
  journal={arXiv preprint arXiv:2510.05396},
  year={2025}
}

Model Card Contact

For questions or issues, please open an issue on GitHub.

Additional Resources

License

This model is released under the MIT License. See LICENSE for details.

Downloads last month
64
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for quicktensor/blockrank-msmarco-mistral-7b

Finetuned
(312)
this model
Quantizations
2 models

Dataset used to train quicktensor/blockrank-msmarco-mistral-7b