Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- Frywind/GREAM_data
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# 🧠 GREAM: Generative Reasoning Recommendation Model
|
| 8 |
+
|
| 9 |
+
**Paper:** *[Generative Reasoning Recommendation via LLMs,](https://arxiv.org/pdf/2510.20815) 2025.*
|
| 10 |
+
**Authors:** Minjie Hong\*, Zetong Zhou\*, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, Zhou Zhao†
|
| 11 |
+
**Repository:** [https://github.com/Indolent-Kawhi/GRRM](https://github.com/Indolent-Kawhi/GRRM)
|
| 12 |
+
**HF Papers Link:** [https://huggingface.co/papers/2510.20815](https://huggingface.co/papers/2510.20815)
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## 🧩 Model Summary
|
| 17 |
+
|
| 18 |
+
**GREAM** (Generative Reasoning Recommendation Model) is a **large language model (LLM)-based generative reasoning recommender** designed to unify *understanding, reasoning,* and *prediction* for recommendation tasks.
|
| 19 |
+
It introduces a **reasoning-enhanced, verifiable reinforcement learning** framework that allows both high-throughput direct recommendations and interpretable reasoning-based outputs.
|
| 20 |
+
|
| 21 |
+
### Key Features
|
| 22 |
+
- **Collaborative–Semantic Alignment:** Fuses textual (titles, descriptions, reviews) and behavioral signals to align linguistic and collaborative semantics.
|
| 23 |
+
- **Reasoning Curriculum Activation:** Builds synthetic *Chain-of-Thought (CoT)* data and trains via curriculum to develop causal reasoning for recommendations.
|
| 24 |
+
- **Sparse-Regularized Group Policy Optimization (SRPO):** Enables stable RL fine-tuning using *Residual-Sensitive Verifiable Rewards* and *Bonus-Calibrated Group Advantage Estimation* for sparse feedback.
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## 🧠 Model Architecture
|
| 29 |
+
|
| 30 |
+
| Component | Description |
|
| 31 |
+
|------------|--------------|
|
| 32 |
+
| **Backbone** | Qwen3-4B-Instruct|
|
| 33 |
+
| **Indexing** | Residual Quantization (RQ-KMeans, 5 levels, 256 values per level) |
|
| 34 |
+
| **Training Phases** | ① Collaborative–Semantic Alignment → ② Reasoning Curriculum Activation → ③ SRPO Reinforcement Learning |
|
| 35 |
+
| **Inference Modes** | - **Direct Sequence Recommendation:** low-latency item generation<br> - **Sequential Reasoning Recommendation:** interpretable CoT reasoning chains |
|
| 36 |
+
| **RL Framework** | Verl + SGLang backend|
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## 📚 Training Data
|
| 41 |
+
|
| 42 |
+
| Data Type | Source | Description |
|
| 43 |
+
|------------|---------|-------------|
|
| 44 |
+
| **D<sub>align</sub>** | Amazon Review Datasets (Beauty, Sports, Instruments) | Sequential, semantic reconstruction, and preference understanding tasks |
|
| 45 |
+
| **D<sub>reason</sub>** | Synthetic CoT data generated via GPT-5 / Qwen3-30B / Llama-3.1 | Multi-step reasoning sequences with `<think>...</think>` and `<answer>...</answer>` supervision |
|
| 46 |
+
| **Text Sources** | Item titles, descriptions, and high-quality reviews | Combined and rewritten to form dense item semantics |
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## 📊 Evaluation
|
| 52 |
+
|
| 53 |
+
### Datasets
|
| 54 |
+
- **Amazon-Beauty**
|
| 55 |
+
- **Amazon-Sports & Outdoors**
|
| 56 |
+
- **Amazon-Musical Instruments**
|
| 57 |
+
|
| 58 |
+
## Citation
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
@misc{hong2025generativereasoningrecommendationllms,
|
| 62 |
+
title={Generative Reasoning Recommendation via LLMs},
|
| 63 |
+
author={Minjie Hong and Zetong Zhou and Zirun Guo and Ziang Zhang and Ruofan Hu and Weinan Gan and Jieming Zhu and Zhou Zhao},
|
| 64 |
+
year={2025},
|
| 65 |
+
eprint={2510.20815},
|
| 66 |
+
archivePrefix={arXiv},
|
| 67 |
+
primaryClass={cs.IR},
|
| 68 |
+
url={https://arxiv.org/abs/2510.20815},
|
| 69 |
+
}
|
| 70 |
+
```
|