Kseniase
Β·
AI & ML interests
None yet
Recent Activity
replied to
their
post
about 17 hours ago
9 new policy optimization techniques
Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.
Here are 9 fresh policy optimization techniques worth knowing:
1. GSPO: Group Sequence Policy Optimization β https://huggingface.co/papers/2507.18071
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.
2. LAPO: Length-Adaptive Policy Optimization β https://huggingface.co/papers/2507.15758
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.
3. HBPO: Hierarchical Budget Policy Optimization β https://huggingface.co/papers/2507.15844
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.
4. SOPHIA: Semi-off-policy reinforcement learning β https://huggingface.co/papers/2507.16814
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.
5. RePO: Replay-Enhanced Policy Optimization β https://huggingface.co/papers/2506.09340
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt
Read further below β¬οΈ
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
posted
an
update
about 17 hours ago
9 new policy optimization techniques
Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.
Here are 9 fresh policy optimization techniques worth knowing:
1. GSPO: Group Sequence Policy Optimization β https://huggingface.co/papers/2507.18071
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.
2. LAPO: Length-Adaptive Policy Optimization β https://huggingface.co/papers/2507.15758
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.
3. HBPO: Hierarchical Budget Policy Optimization β https://huggingface.co/papers/2507.15844
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.
4. SOPHIA: Semi-off-policy reinforcement learning β https://huggingface.co/papers/2507.16814
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.
5. RePO: Replay-Enhanced Policy Optimization β https://huggingface.co/papers/2506.09340
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt
Read further below β¬οΈ
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe
posted
an
update
8 days ago
6 Essential Reads on core AI/ML topics:
Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning!
Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:
1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu β https://arxiv.org/abs/2501.09223
Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference
2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> https://huggingface.co/papers/2503.06072
Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques
3. Agentic Large Language Models, a survey by Leiden University β https://arxiv.org/abs/2503.23037
Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.
4. A Survey of Context Engineering for Large Language Models β https://huggingface.co/papers/2507.13334
Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems
5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models β https://arxiv.org/abs/2506.10016
Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges
6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges β https://arxiv.org/abs/2506.11040
Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them
Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe
View all activity
Organizations
-
-
-
-
-
-
-
-
-
-
-
published
an
article
about 1 month ago
view article
π¦Έπ»#17: What is A2A and why is it β still! β underappreciated?
view article
What is MoE 2.0? Update Your Knowledge about Mixture-of-experts
view article
Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained β Whatβs Really Changing in Transformers?
view article
ποΈπ§© TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI
view article
What is Qwen-Agent framework? Inside the Qwen family
view article
π#92: Fight for Developers and the Year of Orchestration
view article
π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It?
view article
π#90: Why AIβs Reasoning Tests Keep Failing Us
view article
π¦Έπ»#13: Action! How AI Agents Execute Tasks with UI and API Tools
view article
π¦Έπ»#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI
view article
Everything You Need to Know about Knowledge Distillation
view article
π#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025
view article
π#88: Can DeepSeek Inspire Global Collaboration?