arxiv:2508.11408
garyzhang
xiaoniqiu
ยท
AI & ML interests
LLM, Agents
Recent Activity
updated
a dataset
about 1 month ago
datajuicer/geometry_sft
published
a dataset
about 1 month ago
datajuicer/geometry_sft
upvoted
a
paper
about 2 months ago
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm:
Demystifying Some Myths About GRPO and Its Friends