Post
2214
Hey, amazing, awesome people of the beautiful internet 😍🥰
Distillation has been (from my point of view) a main driving factor for the success of hashtag#LLMs - like distilling the knowledge of an amazing big model (say hashtag#DeepSeekv3, or hashtag#GeminiAI) into yours.
Probably, you have done it with minimising a KL divergence, and it somehow worked.
Well, not that well, right?
1️⃣ Your model tends to memorise!
2️⃣ Your model might get the right answer, but its reasoning might be flawed.
To fix those problems, we rethink distillation and process a new approach! A method that is based on constrained RL that comes with nice theoretical guarantees and excellent performance!
Check it out: Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective (2509.22921)
Let us do distillation right! Please upvote if you find it useful!
Distillation has been (from my point of view) a main driving factor for the success of hashtag#LLMs - like distilling the knowledge of an amazing big model (say hashtag#DeepSeekv3, or hashtag#GeminiAI) into yours.
Probably, you have done it with minimising a KL divergence, and it somehow worked.
Well, not that well, right?
1️⃣ Your model tends to memorise!
2️⃣ Your model might get the right answer, but its reasoning might be flawed.
To fix those problems, we rethink distillation and process a new approach! A method that is based on constrained RL that comes with nice theoretical guarantees and excellent performance!
Check it out: Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective (2509.22921)
Let us do distillation right! Please upvote if you find it useful!