Can you provide Machine Specs

#2
by kingabzpro - opened

How many H100s are required to run this model locally and other parameters for hardware optimization.

From the deployment guide:

The smallest deployment unit for Kimi-K2 FP8 weights with 128k seqlen on mainstream H200 or H20 platform is a cluster with 16 GPUs with either Tensor Parallel (TP) or "data parallel + expert parallel" (DP+EP).

https://github.com/MoonshotAI/Kimi-K2/blob/main/docs/deploy_guidance.md

Moonshot AI org

The number of H100s needed at least is 16 with very short sequence length (only for simple testing). For a normal experience, 32 H100s are required.

If someone can actually test this model, tell me if its good.

Moonshot AI org

@vpakarinen it's really good, you should try it!

The number of H100s needed at least is 16 with very short sequence length (only for simple testing). For a normal experience, 32 H100s are required.

Can you provide an sglang example with 32 H100s? :)

Moonshot AI org

Can you provide an sglang example with 32 H100s? :)

In SGLang, the way we recommend to deploy K2 is to use P-D-Disaggregation with DP+EP. It needs 2 prefilling nodes and 4 decoding nodes at least. In our simple testing, only using 32 H100s DP+EP deployment without P-D-Disaggregation has some problems (probably I'm wrong). I think you can also ask for suggestions in SGLang community.

Can I Deploy this setup to. 4 Node that each have RTX4000 Ada + 64GB Ram + 10Gbps Network ultra low latency?

This comment has been hidden (marked as Resolved)

Can I Deploy this setup to. 4 Node that each have RTX4000 Ada + 64GB Ram + 10Gbps Network ultra low latency?

I dont think so. Wait for the quantized version of the model.

Can you provide an sglang example with 32 H100s? :)

In SGLang, the way we recommend to deploy K2 is to use P-D-Disaggregation with DP+EP. It needs 2 prefilling nodes and 4 decoding nodes at least. In our simple testing, only using 32 H100s DP+EP deployment without P-D-Disaggregation has some problems (probably I'm wrong). I think you can also ask for suggestions in SGLang community.

Would you recommend other packages for inference with H100 nodes?

Sign up or log in to comment