---
base_model:
- Qwen/Qwen3-4B
tags:
- text-generation-inference
- transformers
- unsloth
- qwen3
license: other
license_name: anvdl-1.0
license_link: https://huggingface.co/apexion-ai/Nous-V1-8B/blob/main/LICENSE.md
language:
  - en
  - fr
  - pt
  - de
  - ro
  - sv
  - da
  - bg
  - ru
  - cs
  - el
  - uk
  - es
  - nl
  - sk
  - hr
  - pl
  - lt
  - nb
  - nn
  - fa
  - sl
  - gu
  - lv
  - it
  - oc
  - ne
  - mr
  - be
  - sr
  - lb
  - vec
  - as
  - cy
  - szl
  - ast
  - hne
  - awa
  - mai
  - bho
  - sd
  - ga
  - fo
  - hi
  - pa
  - bn
  - or
  - tg
  - yi
  - lmo
  - lij
  - scn
  - fur
  - sc
  - gl
  - ca
  - is
  - sq
  - li
  - prs
  - af
  - mk
  - si
  - ur
  - mag
  - bs
  - hy
  - zh
  - yue
  - my
  - ar
  - he
  - mt
  - id
  - ms
  - tl
  - ceb
  - jv
  - su
  - min
  - ban
  - pag
  - ilo
  - war
  - ta
  - te
  - kn
  - ml
  - tr
  - az
  - uz
  - kk
  - ba
  - tt
  - th
  - lo
  - fi
  - et
  - hu
  - vi
  - km
  - ja
  - ko
  - ka
  - eu
  - ht
  - pap
  - kea
  - tpi
  - sw

---
![Header](./Nous-V1-Banner.png)
# Nous-V1 4B

## Overview

**Nous-V1 4B** is a cutting-edge 4 billion parameter language model developed by Apexion AI, based on the architecture of [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). Designed for versatility across diverse NLP tasks, Nous-V1 4B delivers strong performance in conversational AI, knowledge reasoning, code generation, and content creation.

**Key Features:**

- **⚡ Efficient 4B Parameter Scale:** Balances model capability with practical deployment on modern hardware  
- **🧠 Enhanced Contextual Understanding:** Supports an 128k token context window, enabling complex multi-turn conversations and document analysis  
- **🌐 Multilingual & Multi-domain:** Trained on a diverse dataset for broad language and domain coverage  
- **🤖 Instruction-Following & Adaptability:** Fine-tuned to respond accurately and adaptively across tasks  
- **🚀 Optimized Inference:** Suitable for GPU environments such as NVIDIA A100, T4, and P100 for low-latency applications  

---

## Why Choose Nous-V1 4B?

While larger models can offer more raw power, Nous-V1 4B strikes a practical balance — optimized for deployment efficiency without significant compromise on language understanding or generation quality. It’s ideal for applications requiring:

- Real-time conversational agents  
- Code completion and programming assistance  
- Content generation and summarization  
- Multilingual natural language understanding  

---

## 🖥️ How to Run Locally

You can easily integrate Nous-V1 4B via the Hugging Face Transformers library or deploy it on popular serving platforms.

### Using Hugging Face Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "apexion-ai/Nous-1-4B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

```

### Deployment Options

- Compatible with [vLLM](https://github.com/vllm-project/vllm) for efficient serving  
- Works with [llama.cpp](https://github.com/ggerganov/llama.cpp) for lightweight inference  

---

## Recommended Sampling Parameters

```yaml
Temperature: 0.7
Top-p: 0.9
Top-k: 40
Min-p: 0.0
```

---

## FAQ

- **Q:** Can I fine-tune Nous-V1 4B on my custom data?  
  **A:** Yes, the model supports fine-tuning workflows via Hugging Face Trainer or custom scripts.

- **Q:** What hardware is recommended?  
  **A:** NVIDIA GPUs with at least 16GB VRAM (e.g., A100, 3090) are optimal for inference and fine-tuning.

- **Q:** Is the model safe to use for production?  
  **A:** Nous-V1 4B includes safety mitigations but should be used with human oversight and proper filtering for sensitive content.


---

## 📄 Citation

```bibtex
@misc{apexion2025nousv14b,
  title={Nous-V1 4B: Efficient Large Language Model for Versatile NLP Applications},
  author={Apexion AI Team},
  year={2025},
  url={https://huggingface.co/apexion-ai/Nous-V1-4B}
}
```

---

*Nous-V1 4B — Powering practical AI applications with intelligent language understanding.*