Instructions to use oscar128372/Nova-0.5-e3-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use oscar128372/Nova-0.5-e3-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="oscar128372/Nova-0.5-e3-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("oscar128372/Nova-0.5-e3-7B")
model = AutoModelForCausalLM.from_pretrained("oscar128372/Nova-0.5-e3-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use oscar128372/Nova-0.5-e3-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "oscar128372/Nova-0.5-e3-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oscar128372/Nova-0.5-e3-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/oscar128372/Nova-0.5-e3-7B

SGLang

How to use oscar128372/Nova-0.5-e3-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "oscar128372/Nova-0.5-e3-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oscar128372/Nova-0.5-e3-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "oscar128372/Nova-0.5-e3-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "oscar128372/Nova-0.5-e3-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use oscar128372/Nova-0.5-e3-7B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for oscar128372/Nova-0.5-e3-7B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for oscar128372/Nova-0.5-e3-7B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for oscar128372/Nova-0.5-e3-7B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="oscar128372/Nova-0.5-e3-7B",
    max_seq_length=2048,
)

Docker Model Runner
How to use oscar128372/Nova-0.5-e3-7B with Docker Model Runner:
```
docker model run hf.co/oscar128372/Nova-0.5-e3-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Nova 0.5 e3

This model shows an interesting emergent property.

If asked a simple prompts like What is 4 x 2?, it'll say 8. Simple, right?

Well, after some tests. I found out that when asked a problem that is more complicated like Solve x^2 + x = 8., something interesting happens.

<|begin_of_thought|>
**Analysis:**
The equation is a quadratic equation in standard form \(x^2 + x - 8 = 0\). Factoring directly seems challenging, so substitution and factoring of a perfect square trinomial is attempted.

**First:**
Rewrite the equation by adding 1 to both sides:
\(x^2 + x + 1 = 9\).

**Second:**
Recognize that \(x^2 + x + 1\) can be written as \((x + \frac{1}{2})^2 - (\frac{1}{2})^2 + 1\), simplifying to \((x + \frac{1}{2})^2 - \frac{1}{4} + 1 = (x + \frac{1}{2})^2 + \frac{3}{4}\).

**Next:**
Set this equal to 9:
\((x + \frac{1}{2})^2 + \frac{3}{4} = 9\).

**Reflection:**
Subtract \(\frac{3}{4}\) from both sides:
\((x + \frac{1}{2})^2 = 9 - \frac{3}{4} = \frac{36}{4} - \frac{3}{4} = \frac{33}{4}\).

**Finally:**
Take the square root of both sides:
\(x + \frac{1}{2} = \pm \sqrt{\frac{33}{4}} = \pm \frac{\sqrt{33}}{2}\).

**Summarizing:**
Solving for \(x\):
\(x = -\frac{1}{2} \pm \frac{\sqrt{33}}{2} = \frac{-1 \pm \sqrt{33}}{2}\).
The solutions are:
\(\boxed{\frac{-1 + \sqrt{33}}{2}, \frac{-1 - \sqrt{33}}{2}}\).
<|end_of_thought|>

<|begin_of_solution|>
**Solution:**
Starting with \(x^2 + x = 8\), rewrite it as \(x^2 + x + 1 = 9\). Completing the square on the left side:

1. Rewrite \(x^2 + x + 1\) as \((x + \frac{1}{2})^2 - (\frac{1}{2})^2 + 1 = (x + \frac{1}{2})^2 - \frac{1}{4} + 1 = (x + \frac{1}{2})^2 + \frac{3}{4}\).
2. Set equal to 9: \((x + \frac{1}{2})^2 + \frac{3}{4} = 9\).
3. Subtract \(\frac{3}{4}\): \((x + \frac{1}{2})^2 = \frac{33}{4}\).
4. Take square roots: \(x + \frac{1}{2} = \pm \frac{\sqrt{33}}{2}\).
5. Solve for \(x\): \(x = -\frac{1}{2} \pm \frac{\sqrt{33}}{2}\).

Thus, the solutions are \(\boxed{\frac{-1 + \sqrt{33}}{2}, \frac{-1 - \sqrt{33}}{2}}\).
<|end_of_solution|>

As you can see, it begins to reason. Not only that, it got it correct!

Of course, this reasoning is not as good as Nova-0.5-r1-7B, but it still shows promise!

How was this achieved?

Believe it or not, this was by complete accident. I accidentally mixed in a reasoning dataset along with other simpler datasets! Oops...

Turns out, the reasoning dataset taught the model to only reason during more complex, multi-stepped prompts.

What you NEED to know before using:

Number 1, you cannot load it in 4 bit. I was so confused on why it wasn't working. Loading it in 4 bit just removes the reasoning entirely and turns it into an actual base model, so if you don't want reasoning, that works, I guess? I don't really know why.

Number 2, use the ChatML template:

<|im_start|>system
{}<|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
{}

I have only tested the model using "You are a helpful assistant." system prompt, so other system prompts may produce incorrect or unexpected results.

How to run the model?

Below is a basic example to load and run Nova 0.5 e3 using Python and the Hugging Face transformers library. Make sure you have the required dependencies installed (transformers, torch, etc.).

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the tokenizer and model from Hugging Face
model_name = "oscar128372/Nova-0.5-e3-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Setup ChatML prompt
chatml_prompt = """
<|im_start|>system
{}<|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
"""

# Example system prompt
system_prompt = "You are a helpful assistant."

# Example prompt
prompt = "Solve x^2 + x = 8."

# Tokenize input
inputs = tokenizer(
[
    chatml_prompt.format(
      system_prompt,
      prompt
    )
], return_tensors="pt").to(device)

# Generate response
outputs = model.generate(
    **inputs,
    max_length=1024,  # Keep this high for reasoning, else, keep it low.
)

# Decode and print the result
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Expected output: A reasoned solution, e.g., "x = (-1 ± √33)/2"