Safetensors
Frywind commited on
Commit
04b30f9
·
verified ·
1 Parent(s): c273ab8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Frywind/GREAM_data
5
+ ---
6
+
7
+ # 🧠 GREAM: Generative Reasoning Recommendation Model
8
+
9
+ **Paper:** *[Generative Reasoning Recommendation via LLMs,](https://arxiv.org/pdf/2510.20815) 2025.*
10
+ **Authors:** Minjie Hong\*, Zetong Zhou\*, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, Zhou Zhao†
11
+ **Repository:** [https://github.com/Indolent-Kawhi/GRRM](https://github.com/Indolent-Kawhi/GRRM)
12
+ **HF Papers Link:** [https://huggingface.co/papers/2510.20815](https://huggingface.co/papers/2510.20815)
13
+
14
+ ---
15
+
16
+ ## 🧩 Model Summary
17
+
18
+ **GREAM** (Generative Reasoning Recommendation Model) is a **large language model (LLM)-based generative reasoning recommender** designed to unify *understanding, reasoning,* and *prediction* for recommendation tasks.
19
+ It introduces a **reasoning-enhanced, verifiable reinforcement learning** framework that allows both high-throughput direct recommendations and interpretable reasoning-based outputs.
20
+
21
+ ### Key Features
22
+ - **Collaborative–Semantic Alignment:** Fuses textual (titles, descriptions, reviews) and behavioral signals to align linguistic and collaborative semantics.
23
+ - **Reasoning Curriculum Activation:** Builds synthetic *Chain-of-Thought (CoT)* data and trains via curriculum to develop causal reasoning for recommendations.
24
+ - **Sparse-Regularized Group Policy Optimization (SRPO):** Enables stable RL fine-tuning using *Residual-Sensitive Verifiable Rewards* and *Bonus-Calibrated Group Advantage Estimation* for sparse feedback.
25
+
26
+ ---
27
+
28
+ ## 🧠 Model Architecture
29
+
30
+ | Component | Description |
31
+ |------------|--------------|
32
+ | **Backbone** | Qwen3-4B-Instruct|
33
+ | **Indexing** | Residual Quantization (RQ-KMeans, 5 levels, 256 values per level) |
34
+ | **Training Phases** | ① Collaborative–Semantic Alignment → ② Reasoning Curriculum Activation → ③ SRPO Reinforcement Learning |
35
+ | **Inference Modes** | - **Direct Sequence Recommendation:** low-latency item generation<br> - **Sequential Reasoning Recommendation:** interpretable CoT reasoning chains |
36
+ | **RL Framework** | Verl + SGLang backend|
37
+
38
+ ---
39
+
40
+ ## 📚 Training Data
41
+
42
+ | Data Type | Source | Description |
43
+ |------------|---------|-------------|
44
+ | **D<sub>align</sub>** | Amazon Review Datasets (Beauty, Sports, Instruments) | Sequential, semantic reconstruction, and preference understanding tasks |
45
+ | **D<sub>reason</sub>** | Synthetic CoT data generated via GPT-5 / Qwen3-30B / Llama-3.1 | Multi-step reasoning sequences with `<think>...</think>` and `<answer>...</answer>` supervision |
46
+ | **Text Sources** | Item titles, descriptions, and high-quality reviews | Combined and rewritten to form dense item semantics |
47
+
48
+
49
+ ---
50
+
51
+ ## 📊 Evaluation
52
+
53
+ ### Datasets
54
+ - **Amazon-Beauty**
55
+ - **Amazon-Sports & Outdoors**
56
+ - **Amazon-Musical Instruments**
57
+
58
+ ## Citation
59
+
60
+ ```
61
+ @misc{hong2025generativereasoningrecommendationllms,
62
+ title={Generative Reasoning Recommendation via LLMs},
63
+ author={Minjie Hong and Zetong Zhou and Zirun Guo and Ziang Zhang and Ruofan Hu and Weinan Gan and Jieming Zhu and Zhou Zhao},
64
+ year={2025},
65
+ eprint={2510.20815},
66
+ archivePrefix={arXiv},
67
+ primaryClass={cs.IR},
68
+ url={https://arxiv.org/abs/2510.20815},
69
+ }
70
+ ```