Instructions to use Qwen/Qwen-14B-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen-14B-Chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Qwen/Qwen-14B-Chat", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-14B-Chat", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/Qwen-14B-Chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen-14B-Chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen-14B-Chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Qwen/Qwen-14B-Chat
- SGLang
How to use Qwen/Qwen-14B-Chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen-14B-Chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen-14B-Chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen-14B-Chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen-14B-Chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Qwen/Qwen-14B-Chat with Docker Model Runner:
docker model run hf.co/Qwen/Qwen-14B-Chat
Can you please submit this to leaderboard?
We (@mlabonne , @chiphuyen & me) are trying to do correlation analysis between human judgement and different benchmarks,
and the Chat version of this model is missing from hugging leaderboard.
(base model exists but it's different)
Can you guys please submit the 14B chat version to hugging leaderboard as well?
context: https://twitter.com/gblazex/status/1737574824753467647
In fact, the modeling and tokenization both need merging for the leaderboard to work.
Currently, the base models (as foundation models) are manually run by HF staff (that's why its on the leaderboard). I don't think the chat models can enjoy the privilege though.
We plan to merge the code with transformers, but no schedule can be confirmed now.
@clefourrier can Qwen-14B-Chat get a manual run by HF stuff to get on leaderboard?
It would help us a lot in our quest to research the relationship between benchmarks,
and come up with a new representative suite based on them.
context: https://twitter.com/gblazex/status/1737574824753467647
Thank you
Hi,
I'm sorry, we have adopted as a policy to only run foundational models manually as 1) they are the most important for the community, and 2) any manual eval is a lot of added work and we don't have the bandwidth.
However, you can follow our instructions and run the eval yourself if you need results before the code is merged.
no worries, thank you!