view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 54
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 24 days ago • 599
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Paper • 2504.17130 • Published Apr 23 • 1