Mike Ravkine PRO

mike-ravkine

the-crypt-keeper

AI & ML interests

LLM Research / Development / Evaluation

Recent Activity

liked a model 4 days ago

0xSero/DeepSeek-V3.2-REAP-345B-W3A16

posted an update 7 days ago

My hat is off to the https://huggingface.co/upstage team 🎩 https://huggingface.co/upstage/Solar-Open-100B is a very interesting, permissively licensed (Apache-with-attribution), trained from scratch (19T tokens), 12B active MoE - but that's not even the cool part. The cool part is that their fork of vLLM comes with the addition of a `reasoning_effort` parameter and a corresponding reasoning/tool-calling controller FSM to consume it! https://github.com/UpstageAI/vllm/blob/c9a05e077cd82df8cab4f729396c178c29c81aa8/vllm/model_executor/models/solar_open_logits_processor.py Looks like only "medium" and "high" are actually implemented, but still absolutely love to see this sorta thing. To make this model a little more accessible, I have created a FP8-Dynamic quant at https://huggingface.co/mike-ravkine/Solar-Open-100B-FP8-Dynamic which makes it fit nicely into 2xPro-6000 or 4xA6000 GPUs. My ReasonScape evaluations are currently running, will take me a couple days for this one but early results are quite strong: it's showing the competency expected from a 100B reasoning model (it can count the r's in strawberry, it can do basic arithmetic, etc..) and I haven't seen a truncation yet.

updated a model 7 days ago

mike-ravkine/Solar-Open-100B-FP8-Dynamic

View all activity

Organizations

None yet

Posts 15

Post

214

My hat is off to the

upstage team 🎩

upstage/Solar-Open-100B is a very interesting, permissively licensed (Apache-with-attribution), trained from scratch (19T tokens), 12B active MoE - but that's not even the cool part.

The cool part is that their fork of vLLM comes with the addition of a reasoning_effort parameter and a corresponding reasoning/tool-calling controller FSM to consume it!

https://github.com/UpstageAI/vllm/blob/c9a05e077cd82df8cab4f729396c178c29c81aa8/vllm/model_executor/models/solar_open_logits_processor.py

Looks like only "medium" and "high" are actually implemented, but still absolutely love to see this sorta thing.

To make this model a little more accessible, I have created a FP8-Dynamic quant at mike-ravkine/Solar-Open-100B-FP8-Dynamic which makes it fit nicely into 2xPro-6000 or 4xA6000 GPUs.

My ReasonScape evaluations are currently running, will take me a couple days for this one but early results are quite strong: it's showing the competency expected from a 100B reasoning model (it can count the r's in strawberry, it can do basic arithmetic, etc..) and I haven't seen a truncation yet.

View all Posts