--- base_model: - Qwen/Qwen3-4B tags: - text-generation-inference - transformers - unsloth - qwen3 license: other license_name: anvdl-1.0 license_link: https://huggingface.co/apexion-ai/Nous-V1-8B/blob/main/LICENSE.md language: - en - fr - pt - de - ro - sv - da - bg - ru - cs - el - uk - es - nl - sk - hr - pl - lt - nb - nn - fa - sl - gu - lv - it - oc - ne - mr - be - sr - lb - vec - as - cy - szl - ast - hne - awa - mai - bho - sd - ga - fo - hi - pa - bn - or - tg - yi - lmo - lij - scn - fur - sc - gl - ca - is - sq - li - prs - af - mk - si - ur - mag - bs - hy - zh - yue - my - ar - he - mt - id - ms - tl - ceb - jv - su - min - ban - pag - ilo - war - ta - te - kn - ml - tr - az - uz - kk - ba - tt - th - lo - fi - et - hu - vi - km - ja - ko - ka - eu - ht - pap - kea - tpi - sw --- ![Header](./Nous-V1-Banner.png) # Nous-V1 4B ## Overview **Nous-V1 4B** is a cutting-edge 4 billion parameter language model developed by Apexion AI, based on the architecture of [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B). Designed for versatility across diverse NLP tasks, Nous-V1 4B delivers strong performance in conversational AI, knowledge reasoning, code generation, and content creation. **Key Features:** - **⚑ Efficient 4B Parameter Scale:** Balances model capability with practical deployment on modern hardware - **🧠 Enhanced Contextual Understanding:** Supports an 128k token context window, enabling complex multi-turn conversations and document analysis - **🌐 Multilingual & Multi-domain:** Trained on a diverse dataset for broad language and domain coverage - **πŸ€– Instruction-Following & Adaptability:** Fine-tuned to respond accurately and adaptively across tasks - **πŸš€ Optimized Inference:** Suitable for GPU environments such as NVIDIA A100, T4, and P100 for low-latency applications --- ## Why Choose Nous-V1 4B? While larger models can offer more raw power, Nous-V1 4B strikes a practical balance β€” optimized for deployment efficiency without significant compromise on language understanding or generation quality. It’s ideal for applications requiring: - Real-time conversational agents - Code completion and programming assistance - Content generation and summarization - Multilingual natural language understanding --- ## πŸ–₯️ How to Run Locally You can easily integrate Nous-V1 4B via the Hugging Face Transformers library or deploy it on popular serving platforms. ### Using Hugging Face Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "apexion-ai/Nous-1-4B" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Give me a short introduction to large language model." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # Switches between thinking and non-thinking modes. Default is True. ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content try: # rindex finding 151668 () index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content) print("content:", content) ``` ### Deployment Options - Compatible with [vLLM](https://github.com/vllm-project/vllm) for efficient serving - Works with [llama.cpp](https://github.com/ggerganov/llama.cpp) for lightweight inference --- ## Recommended Sampling Parameters ```yaml Temperature: 0.7 Top-p: 0.9 Top-k: 40 Min-p: 0.0 ``` --- ## FAQ - **Q:** Can I fine-tune Nous-V1 4B on my custom data? **A:** Yes, the model supports fine-tuning workflows via Hugging Face Trainer or custom scripts. - **Q:** What hardware is recommended? **A:** NVIDIA GPUs with at least 16GB VRAM (e.g., A100, 3090) are optimal for inference and fine-tuning. - **Q:** Is the model safe to use for production? **A:** Nous-V1 4B includes safety mitigations but should be used with human oversight and proper filtering for sensitive content. --- ## πŸ“„ Citation ```bibtex @misc{apexion2025nousv14b, title={Nous-V1 4B: Efficient Large Language Model for Versatile NLP Applications}, author={Apexion AI Team}, year={2025}, url={https://huggingface.co/apexion-ai/Nous-V1-4B} } ``` --- *Nous-V1 4B β€” Powering practical AI applications with intelligent language understanding.*