9 10 3

Erik Kaunismäki

erikkaum

https://www.erikkaum.com/

AI & ML interests

None yet

Recent Activity

upvoted an article 4 days ago

Continuous batching from first principles

new activity 22 days ago

openai/whisper-large-v3-turbo:WTF is going on?

new activity about 1 month ago

Qwen/Qwen3-Embedding-8B-GGUF:Add feature-extraction as pipline tag

View all activity

Organizations

upvoted an article 4 days ago

Article

Continuous batching from first principles

5 days ago

•

198

New activity in openai/whisper-large-v3-turbo 22 days ago

WTF is going on?

#71 opened 7 months ago by

vbarrier

New activity in Qwen/Qwen3-Embedding-8B-GGUF about 1 month ago

Add feature-extraction as pipline tag

#3 opened about 1 month ago by

erikkaum

New activity in Qwen/Qwen3-Embedding-4B-GGUF about 1 month ago

Add feature-extraction as pipline tag

#6 opened about 1 month ago by

erikkaum

New activity in Qwen/Qwen3-Embedding-0.6B-GGUF about 1 month ago

Add feature-extraction as pipline tag

#16 opened about 1 month ago by

erikkaum

commented on Test-Driving the LLMD Inference Engine by ZML 🚀 4 months ago

Thank you 🫡

posted an update 4 months ago

Post

2606

ZML just released a technical preview of their new Inference Engine: LLMD.

- Just 2.4GB container, which means fast startup times and efficient autoscaling
- Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs.
- written in Zig

I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide 👇 You can try it in like 5 minutes!

https://huggingface.co/blog/erikkaum/test-driving-llmd-inference-engine

1 reply

published an article 4 months ago

Article

Test-Driving the LLMD Inference Engine by ZML 🚀

Jul 18

•

posted an update 5 months ago

Post

2121

We just released native support for @SGLang and @vllm-project in Inference Endpoints 🔥

Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.

And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users 🙌