Instructions to use QuixiAI/WizardLM-Uncensored-Falcon-40b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuixiAI/WizardLM-Uncensored-Falcon-40b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuixiAI/WizardLM-Uncensored-Falcon-40b", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("QuixiAI/WizardLM-Uncensored-Falcon-40b", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QuixiAI/WizardLM-Uncensored-Falcon-40b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuixiAI/WizardLM-Uncensored-Falcon-40b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/WizardLM-Uncensored-Falcon-40b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/QuixiAI/WizardLM-Uncensored-Falcon-40b

SGLang

How to use QuixiAI/WizardLM-Uncensored-Falcon-40b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuixiAI/WizardLM-Uncensored-Falcon-40b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/WizardLM-Uncensored-Falcon-40b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuixiAI/WizardLM-Uncensored-Falcon-40b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/WizardLM-Uncensored-Falcon-40b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use QuixiAI/WizardLM-Uncensored-Falcon-40b with Docker Model Runner:
```
docker model run hf.co/QuixiAI/WizardLM-Uncensored-Falcon-40b
```

Can i run this on tex-gen-ui? I want to stream the token generation

by asach - opened Jun 6, 2023

Discussion

asach

Jun 6, 2023

Please provide some instructions to run this, really appreciate your work and help.

Toaster496

Jun 6, 2023

i think is is https://github.com/oobabooga/text-generation-webui?

ehartford

Quixi AI org Jun 7, 2023

I was able to run on oobabooga
using 2x 3090

install oobabooga
download TheBloke's 4-bit gptq into 'models' directory
modify the following files

modules/models.py ->
          config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=True)
modules/AutoGPTQ_loader.py ->
     # Define the params for AutoGPTQForCausalLM.from_quantized
    params = {
        ...
        "trust_remote_code": True,
        ...
    }

run ooba python server.py --listen --model_type llama --wbits 4 --groupsize -1 --auto-devices
in models tab, select WizardLM-Uncensored-Falcon-40b
if it doesn't load, choose 4-bit and reload
in instructions tab choose prompt instruct-wizardlm
ask your question. It's slow but it works. The answers are spectacular.

asach

Jun 7, 2023

Thanks for the reply! Loading it with 4bit gives this error. Have made the same changes and the config is on runpod

2 X NVIDIA L40
64 vCPU 500 GB RAM

jacohend

Jun 15, 2023

I got it loaded with your instructions, but a nonsense response to the prompt:


### Response:DayGenVerEvEvEv```

Any advice?

merlinjim

Jul 17, 2023

Any plans for an uncensored version of the instruct trained falcon 40b?

ehartford

Quixi AI org Jul 17, 2023

I plan to train Dolphin on Falcon 40b, which I expect will be much better than falcon-40b-instruct.

Hoioi

Jul 17, 2023

I plan to train Dolphin on Falcon 40b, which I expect will be much better than falcon-40b-instruct.

What is your estimation about the release date of this model? Will it be 13b?

asach

Jul 18, 2023

Best Model i have tried for reasoning questions. Thank you !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment