Run 1T-param on A100/H100(80G)x8 using FP4

by ghostplant - opened 16 days ago

16 days ago

Docker Instructions (from https://hub.docker.com/r/tutelgroup/deepseek-671b):

# For A100/A800/H100/H800/H20/H200 (80G x 8):

# Step-1: Download 1TB Model
huggingface-cli download moonshotai/Kimi-K2-Instruct --local-dir ./moonshotai/Kimi-K2-Instruct

# Step-2: Run with A100/H100 (80G x 8):
docker run -it --rm --ipc=host --net=host --shm-size=8g --ulimit memlock=-1 \
      --ulimit stack=67108864 --gpus=all -v /:/host -w /host$(pwd) \
      tutelgroup/deepseek-671b:a100x8-chat-20250712 \
        --try_path ./moonshotai/Kimi-K2-Instruct \
        --serve --listen_port 8000 \
        --prompt "Calculate the indefinite integral of 1/sin(x) + x"

lsw825

Moonshot AI org 16 days ago

Great work! Thanks a lot.

zhnagchenchne

15 days ago

Could you please introduce what framework is used for reasoning?

ghostplant

15 days ago

Do you mean inference framework?

zhnagchenchne

15 days ago

Do you mean inference framework?

@ghostplant Yes!

ghostplant

14 days ago

•

edited 14 days ago

We integrate a couple of well-tuned MoE operators (i.e.Kimi fused gating, low-precision MoE FFN forwarding, etc. all of which are compatible for cheap GPUs) into Tutel, a library containing a collection of efficient MoE computing and communication operators, so the model can leverage these public-unoptimized fixes to resolve their slow execution phases, and finally support a very effective overall inference throughput even using A100.

ghostplant changed discussion status to closed 14 days ago

ghostplant changed discussion status to open 14 days ago

einsteiner1983

13 days ago

•

edited 13 days ago

This is FP4? I think you mean int4?

ghostplant

13 days ago

This is FP4? I think you mean int4?

It inline quants to FP4 so that 8 A100 (80GB) can run this 1T model.

ghostplant changed discussion status to closed 13 days ago

ghostplant changed discussion status to open 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment