Nous-V1 4B

Overview

Nous-V1 4B is a cutting-edge 4 billion parameter language model developed by Apexion AI, based on the architecture of Qwen3-4B. Designed for versatility across diverse NLP tasks, Nous-V1 4B delivers strong performance in conversational AI, knowledge reasoning, code generation, and content creation.

Key Features:

⚡ Efficient 4B Parameter Scale: Balances model capability with practical deployment on modern hardware
🧠 Enhanced Contextual Understanding: Supports an 128k token context window, enabling complex multi-turn conversations and document analysis
🌐 Multilingual & Multi-domain: Trained on a diverse dataset for broad language and domain coverage
🤖 Instruction-Following & Adaptability: Fine-tuned to respond accurately and adaptively across tasks
🚀 Optimized Inference: Suitable for GPU environments such as NVIDIA A100, T4, and P100 for low-latency applications

Why Choose Nous-V1 4B?

While larger models can offer more raw power, Nous-V1 4B strikes a practical balance — optimized for deployment efficiency without significant compromise on language understanding or generation quality. It’s ideal for applications requiring:

Real-time conversational agents
Code completion and programming assistance
Content generation and summarization
Multilingual natural language understanding

🖥️ How to Run Locally

You can easily integrate Nous-V1 4B via the Hugging Face Transformers library or deploy it on popular serving platforms.

Using Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "apexion-ai/Nous-1-4B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

Deployment Options

Compatible with vLLM for efficient serving
Works with llama.cpp for lightweight inference

Recommended Sampling Parameters

Temperature: 0.7
Top-p: 0.9
Top-k: 40
Min-p: 0.0

FAQ

Q: Can I fine-tune Nous-V1 4B on my custom data?
A: Yes, the model supports fine-tuning workflows via Hugging Face Trainer or custom scripts.
Q: What hardware is recommended?
A: NVIDIA GPUs with at least 16GB VRAM (e.g., A100, 3090) are optimal for inference and fine-tuning.
Q: Is the model safe to use for production?
A: Nous-V1 4B includes safety mitigations but should be used with human oversight and proper filtering for sensitive content.

📄 Citation

@misc{apexion2025nousv14b,
  title={Nous-V1 4B: Efficient Large Language Model for Versatile NLP Applications},
  author={Apexion AI Team},
  year={2025},
  url={https://huggingface.co/apexion-ai/Nous-V1-4B}
}

Nous-V1 4B — Powering practical AI applications with intelligent language understanding.

apexion-ai
/

Nous-1-4B

Nous-V1 4B

Overview

Why Choose Nous-V1 4B?

🖥️ How to Run Locally

Using Hugging Face Transformers

Deployment Options

Recommended Sampling Parameters

FAQ

📄 Citation

Model tree for apexion-ai/Nous-1-4B

Space using apexion-ai/Nous-1-4B 1

Collection including apexion-ai/Nous-1-4B

Nous 1