Instructions to use OrionZheng/openmoe-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OrionZheng/openmoe-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OrionZheng/openmoe-8b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("OrionZheng/openmoe-8b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("OrionZheng/openmoe-8b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use OrionZheng/openmoe-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OrionZheng/openmoe-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionZheng/openmoe-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OrionZheng/openmoe-8b
- SGLang
How to use OrionZheng/openmoe-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OrionZheng/openmoe-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionZheng/openmoe-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OrionZheng/openmoe-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionZheng/openmoe-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OrionZheng/openmoe-8b with Docker Model Runner:
docker model run hf.co/OrionZheng/openmoe-8b
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,7 @@ The table below lists the 8B/8B-Chat model that has completed training on 1.1T t
|
|
| 25 |
|
| 26 |
| Model Name | Description | #Param |Huggingface |
|
| 27 |
|----------------|-------------------------------------------------|----------|-------------|
|
| 28 |
-
| **OpenMoE-8B(1.1T)** | 8B MoE with comparable FLOPs of a
|
| 29 |
| **OpenMoE-8B-Chat (1.1T+SFT)** | OpenMoE-8B-1.1T supervised finetuned on the [WildChat GPT-4 Subset](https://huggingface.co/datasets/allenai/WildChat-nontoxic) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-chat) |
|
| 30 |
|
| 31 |
|
|
@@ -34,11 +34,11 @@ Besides, we also provide all our intermediate checkpoints(base, 8B, 34B) for res
|
|
| 34 |
| Model Name | Description | #Param |Huggingface |
|
| 35 |
|----------------|-------------------------------------------------|----------|-------------|
|
| 36 |
| **OpenMoE-34B-200B** | 34B MoE with comparable FLOPs of a 7B LLaMA(No SFT) |34B |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
|
| 37 |
-
| OpenMoE-8B-200B | 8B MoE with comparable FLOPs of a
|
| 38 |
-
| OpenMoE-8B-400B | 8B MoE with comparable FLOPs of a
|
| 39 |
-
| OpenMoE-8B-600B | 8B MoE with comparable FLOPs of a
|
| 40 |
-
| OpenMoE-8B-800B | 8B MoE with comparable FLOPs of a
|
| 41 |
-
| OpenMoE-8B-1T | 8B MoE with comparable FLOPs of a
|
| 42 |
| OpenMoE-base(128B) | A small MoE model for debugging only |637M |[Link](https://huggingface.co/OrionZheng/openmoe-base) |
|
| 43 |
| OpenLLaMA-base(128B) | A dense counter-part of OpenMoE-base |310M |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |
|
| 44 |
|
|
|
|
| 25 |
|
| 26 |
| Model Name | Description | #Param |Huggingface |
|
| 27 |
|----------------|-------------------------------------------------|----------|-------------|
|
| 28 |
+
| **OpenMoE-8B(1.1T)** | 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b) |
|
| 29 |
| **OpenMoE-8B-Chat (1.1T+SFT)** | OpenMoE-8B-1.1T supervised finetuned on the [WildChat GPT-4 Subset](https://huggingface.co/datasets/allenai/WildChat-nontoxic) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-chat) |
|
| 30 |
|
| 31 |
|
|
|
|
| 34 |
| Model Name | Description | #Param |Huggingface |
|
| 35 |
|----------------|-------------------------------------------------|----------|-------------|
|
| 36 |
| **OpenMoE-34B-200B** | 34B MoE with comparable FLOPs of a 7B LLaMA(No SFT) |34B |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
|
| 37 |
+
| OpenMoE-8B-200B | 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B) |
|
| 38 |
+
| OpenMoE-8B-400B | 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-400B) |
|
| 39 |
+
| OpenMoE-8B-600B | 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-600B) |
|
| 40 |
+
| OpenMoE-8B-800B | 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-800B) |
|
| 41 |
+
| OpenMoE-8B-1T | 8B MoE with comparable FLOPs of a 2B LLaMA(No SFT) |8B |[Link](https://huggingface.co/OrionZheng/openmoe-8b-1T) |
|
| 42 |
| OpenMoE-base(128B) | A small MoE model for debugging only |637M |[Link](https://huggingface.co/OrionZheng/openmoe-base) |
|
| 43 |
| OpenLLaMA-base(128B) | A dense counter-part of OpenMoE-base |310M |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |
|
| 44 |
|