Instructions to use IDEA-CCNL/Ziya-LLaMA-13B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use IDEA-CCNL/Ziya-LLaMA-13B-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="IDEA-CCNL/Ziya-LLaMA-13B-v1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("IDEA-CCNL/Ziya-LLaMA-13B-v1")
model = AutoModelForCausalLM.from_pretrained("IDEA-CCNL/Ziya-LLaMA-13B-v1")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use IDEA-CCNL/Ziya-LLaMA-13B-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "IDEA-CCNL/Ziya-LLaMA-13B-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-CCNL/Ziya-LLaMA-13B-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/IDEA-CCNL/Ziya-LLaMA-13B-v1

SGLang

How to use IDEA-CCNL/Ziya-LLaMA-13B-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "IDEA-CCNL/Ziya-LLaMA-13B-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-CCNL/Ziya-LLaMA-13B-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "IDEA-CCNL/Ziya-LLaMA-13B-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "IDEA-CCNL/Ziya-LLaMA-13B-v1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use IDEA-CCNL/Ziya-LLaMA-13B-v1 with Docker Model Runner:
```
docker model run hf.co/IDEA-CCNL/Ziya-LLaMA-13B-v1
```

您好，add_token.json内容和models不匹配，这个应该怎么修改

#22

by redauzhang - opened Jun 6, 2023

Discussion

redauzhang

Jun 6, 2023

这个地方出现，vocab和 add_token 不匹配，应该怎么修改。
我是需要把这个bin改成其他格式的文件，比如让 llama.cpp 进行驱动

qiyang

Fengshenbang-LM org Jun 7, 2023

•

edited Jun 7, 2023

实际词表 39410，config 中的 vocab = 39424 是因为我们使用的训练框架中为了便于切分并行（需被128整除）自动给 embedding 增加了 dummy token ，这部分参数没有意义。

具体怎么修改需要看下 llama.cpp 脚本，可以尝试加 added_token 到 39424，或者如果没有类似需要切分补全的操作，在 convert 脚本中取前面有效 39410 个 embedding tensor + 去掉 check vocab size 检验.

redauzhang

Jun 7, 2023

实际词表 39410，config 中的 vocab = 39424 是因为我们使用的训练框架中为了便于切分并行（需被128整除）自动给 embedding 增加了 dummy token ，这部分参数没有意义。

具体怎么修改需要看下 llama.cpp 脚本，可以尝试加 added_token 到 39424，或者如果没有类似需要切分补全的操作，在 convert 脚本中取前面有效 39410 个 embedding tensor + 去掉 check vocab size 检验.

相关 issue 可能有用 https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1/discussions/5

好的，你帮了大忙了，我再研究下。

redauzhang changed discussion status to closed Jun 7, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment