@JonnaMat on Hugging Face: "🚀 FlashHead: Efficient Drop-In Replacement for the Classification Head in…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

JonnaMat

posted an update 13 days ago

Post

5924

🚀 FlashHead: Efficient Drop-In Replacement for the Classification Head in Language Model Inference

🔎 Check out our latest FlashHead-enabled model: embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead

🧩 Seamless integration with vllm:

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  embedl/vllm:latest-jetson-orin-flashhead \
  vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead" \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75 \
    --max-num-seqs 2 \
    --trust-remote-code

JonnaMat

12 days ago

🤓 Want to learn more about FlashHead? Check out this blog post: https://huggingface.co/blog/JonnaMat/flashhead

In this post

JonnaMat Jonna Matthiesen