[successs] launched at 2xR9700

#1
by djdeniro - opened

Inference speed:

INFO 09-16 08:11:04 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 44.0 tokens/s, Running: 2 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.2%, Prefix cache hit rate: 15.7%
INFO 09-16 08:00:44 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 20.9 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.1%, Prefix cache hit rate: 0.0%

this model launched with no using docker image: rocm/vllm-dev:nightly_main_20250914

QuantTrio org

Congratulations! 🎉

Sign up or log in to comment