[successs] launched at 2xR9700

by djdeniro - opened Sep 16

Sep 16

Inference speed:

INFO 09-16 08:11:04 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 44.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.2%, Prefix cache hit rate: 15.7%
INFO 09-16 08:00:44 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 20.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.1%, Prefix cache hit rate: 0.0%

this model launched with no using docker image: rocm/vllm-dev:nightly_main_20250914

JunHowie

QuantTrio org Sep 16

Congratulations! 🎉

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment