Instructions to use 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm", dtype="auto")

RKLLM

How to use 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm with RKLLM:

# No code snippets available yet for this library.

# To use this model, check the repository files and the library's documentation.

# Want to help? PRs adding snippets are welcome at:
# https://github.com/huggingface/huggingface.js

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm

SGLang

How to use 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm with Docker Model Runner:
```
docker model run hf.co/3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm
```

3ib0n's RKLLM Guide

These models and binaries require an RK3588 board running rknpu driver version 0.9.7 or above

Steps to reproduce conversion

# Download and setup miniforge3
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

# activate the base environment
source ~/miniforge3/bin/activate

# create and activate a python 3.8 environment
conda create -n rknn-llm-1.1.4 python=3.8
conda activate rknn-llm-1.1.4

# clone the lastest rknn-llm toolkit
git clone https://github.com/airockchip/rknn-llm.git

# intstall dependencies for the toolkit
pip install transformers accelerate torchvision rknn-toolkit2==2.2.1
pip install --upgrade torch pillow

# install rkllm
pip install ../../rkllm-toolkit/packages/rkllm_toolkit-1.1.4-cp38-cp38-linux_x86_64.whl

# edit or create a script to export rkllm models
cd rknn-llm/examples/rkllm_multimodal_demo
nano export/export_rkllm.py # update input and output paths 
python export/export_rkllm.py

Example export_rkllm.py modified from https://github.com/airockchip/rknn-llm/blob/main/examples/rkllm_multimodel_demo/export/export_rkllm.py

import os
from rkllm.api import RKLLM
from datasets import load_dataset
from transformers import  AutoTokenizer
from tqdm import tqdm
import torch
from torch import nn

modelpath = "~/models/Qwen/Qwen2.5-Coder-14B-Instruct/" ## UPDATE HERE
savepath = './Qwen2.5-Coder-14B-Instruct.rkllm' ## UPDATE HERE
llm = RKLLM()

# Load model
# Use 'export CUDA_VISIBLE_DEVICES=2' to specify GPU device
ret = llm.load_huggingface(model=modelpath, device='cpu')
if ret != 0:
    print('Load model failed!')
    exit(ret)

# Build model
qparams = None

## Do not use the dataset parameter as we are converting a pure text model, not a multimodal
ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w8a8',
                quantized_algorithm='normal', target_platform='rk3588', num_npu_core=3, extra_qparams=qparams)

if ret != 0:
    print('Build model failed!')
    exit(ret)

# # Export rkllm model
ret = llm.export_rkllm(savepath)
if ret != 0:
    print('Export model failed!')
    exit(ret)

Steps to build and run demo

# Dwonload the correct toolchain for working with rkllm
# Documentation here: https://github.com/airockchip/rknn-llm/blob/main/doc/Rockchip_RKLLM_SDK_EN_1.1.0.pdf
wget https://developer.arm.com/-/media/Files/downloads/gnu-a/10.2-2020.11/binrel/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz
tar -xz gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz

# ensure that the gcc compiler path is set to the location where the toolchain dowloaded earlier is unpacked
nano deploy/build-linux.sh # update the gcc compiler path

# compile the demo app
cd delpoy/
./build-linux.sh

Steps to run the app

More information and original guide: https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_multimodel_demo

# push install dir to device
adb push ./install/demo_Linux_aarch64 /data
# push model file to device
adb push Qwen2.5-Coder-14B-Instruct.rkllm /data/models

adb shell
cd /data/demo_Linux_aarch64
# export lib path
export LD_LIBRARY_PATH=./lib
# soft link models dir
ln -s /data/models .
# run llm(Pure Text Example)
./llm models/Qwen2.5-Coder-14B-Instruct.rkllm 128 512

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for 3ib0n/Qwen2.5-14B-Coder-Instruct-rkllm

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-Coder-14B

Finetuned

(16)

this model