Frywind
/

GREAM

Safetensors

Model card Files Files and versions

xet

Community

Frywind commited on Oct 24, 2025

Commit

04b30f9

verified ·

1 Parent(s): c273ab8

Update README.md

Browse files

Files changed (1) hide show

README.md +70 -3

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- Frywind/GREAM_data
+---
+# 🧠 GREAM: Generative Reasoning Recommendation Model
+**Paper:** *[Generative Reasoning Recommendation via LLMs,](https://arxiv.org/pdf/2510.20815) 2025.*
+**Authors:** Minjie Hong\*, Zetong Zhou\*, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, Zhou Zhao†
+**Repository:** [https://github.com/Indolent-Kawhi/GRRM](https://github.com/Indolent-Kawhi/GRRM)
+**HF Papers Link:** [https://huggingface.co/papers/2510.20815](https://huggingface.co/papers/2510.20815)
+---
+## 🧩 Model Summary
+**GREAM** (Generative Reasoning Recommendation Model) is a **large language model (LLM)-based generative reasoning recommender** designed to unify *understanding, reasoning,* and *prediction* for recommendation tasks.
+It introduces a **reasoning-enhanced, verifiable reinforcement learning** framework that allows both high-throughput direct recommendations and interpretable reasoning-based outputs.
+### Key Features
+- **Collaborative–Semantic Alignment:** Fuses textual (titles, descriptions, reviews) and behavioral signals to align linguistic and collaborative semantics.
+- **Reasoning Curriculum Activation:** Builds synthetic *Chain-of-Thought (CoT)* data and trains via curriculum to develop causal reasoning for recommendations.
+- **Sparse-Regularized Group Policy Optimization (SRPO):** Enables stable RL fine-tuning using *Residual-Sensitive Verifiable Rewards* and *Bonus-Calibrated Group Advantage Estimation* for sparse feedback.
+---
+## 🧠 Model Architecture
+| Component | Description |
+|------------|--------------|
+| **Backbone** | Qwen3-4B-Instruct|
+| **Indexing** | Residual Quantization (RQ-KMeans, 5 levels, 256 values per level) |
+| **Training Phases** | ① Collaborative–Semantic Alignment → ② Reasoning Curriculum Activation → ③ SRPO Reinforcement Learning |
+| **Inference Modes** | - **Direct Sequence Recommendation:** low-latency item generation<br> - **Sequential Reasoning Recommendation:** interpretable CoT reasoning chains |
+| **RL Framework** | Verl + SGLang backend|
+---
+## 📚 Training Data
+| Data Type | Source | Description |
+|------------|---------|-------------|
+| **D<sub>align</sub>** | Amazon Review Datasets (Beauty, Sports, Instruments) | Sequential, semantic reconstruction, and preference understanding tasks |
+| **D<sub>reason</sub>** | Synthetic CoT data generated via GPT-5 / Qwen3-30B / Llama-3.1 | Multi-step reasoning sequences with `<think>...</think>` and `<answer>...</answer>` supervision |
+| **Text Sources** | Item titles, descriptions, and high-quality reviews | Combined and rewritten to form dense item semantics |
+---
+## 📊 Evaluation
+### Datasets
+- **Amazon-Beauty**
+- **Amazon-Sports & Outdoors**
+- **Amazon-Musical Instruments**
+## Citation
+```
+@misc{hong2025generativereasoningrecommendationllms,
+      title={Generative Reasoning Recommendation via LLMs},
+      author={Minjie Hong and Zetong Zhou and Zirun Guo and Ziang Zhang and Ruofan Hu and Weinan Gan and Jieming Zhu and Zhou Zhao},
+      year={2025},
+      eprint={2510.20815},
+      archivePrefix={arXiv},
+      primaryClass={cs.IR},
+      url={https://arxiv.org/abs/2510.20815},
+}
+```