Stack 3.0 Omni Nexus

Mixture-of-Experts model for sovereign AI infrastructure

Stack 3.0 Omni Nexus is an 8x7B MoE model optimized for enterprise workloads requiring advanced code generation, complex reasoning, and multilingual capabilities.

📊 Benchmarks (vs Leading Models)

Benchmark Stack 3.0 Omni Nexus Llama 3.1 70B Mixtral 8x7B
HumanEval (pass@1) 82.0% 76.2% 74.8%
MBPP (pass@1) 78.5% 72.1% 70.3%
GSM8K (5-shot) 91.2% 89.5% 88.1%
MMLU (5-shot) 68.4% 69.8% 67.2%
CodeForces (rating) 1842 1765 1721

🎯 Performance

Metric Value
Active Params ~14B (2 of 8 experts)
Total Params ~56B
Context 131,072 tokens (128K)
VRAM (Q4_K_M) ~3.5 GB
Speed (A100) ~45 tps

🚀 Quick Start

Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "my-ai-stack/Stack-3.0-Omni-Nexus"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Write a Python function to implement a thread-safe LRU cache with O(1) operations."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.2)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

lama.cpp

# Download: https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus/tree/main
./main -m stack-3.0-omni-nexus-q4_k_m.gguf \
  -n 512 -t 8 -c 131072 --temp 0.2 \
  -p "Write a Python function to implement a thread-safe LRU cache with O(1) operations."

Ollama

ollama pull stack-3.0-omni-nexus
ollama run stack-3.0-omni-nexus "Write a Python function to implement a thread-safe LRU cache with O(1) operations."

🤗 GGUF Variants (Download Counts)

Quantization File Size Downloads Use Case
FP16 56.0 GB - Research
Q8_0 28.0 GB - High quality
Q4_K_M 14.0 GB 1.38k Balanced ⭐
Q3_K_M 10.0 GB 190 Low-end GPUs
Q2_K 7.0 GB - Minimum VRAM

🏛️ Architecture

Input → Nexus-7B Engine → [Expert 1, Expert 3] (Top-2 routing)
                      ↓
              Output (only 14B params active)
  • Total Experts: 8
  • Active Experts: 2 (per forward pass)
  • Context Length: 131,072 tokens (128K)
  • Vocabulary Size: 151,936 tokens

🌍 Use Cases

Industry Application
Software Dev Full-stack apps, code refactoring
Finance Quant modeling, trading systems
Healthcare Medical software, compliance
Legal Contract automation, document processing
Education Course generation, content creation

⚠️ Limitations

  • Requires high-end GPU for FP16 inference
  • May need fine-tuning for specialized domains
  • Always verify generated code before production

📁 Citation

@misc{stack-3.0-omni-nexus,
  author = {Walid Sobhi},
  title = {Stack 3.0 Omni Nexus: 8x7B Mixture-of-Experts Model},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/my-ai-stack/Stack-3.0-Omni-Nexus}
}

Built with ❤️ for sovereign AI infrastructure
Discord · GitHub · Website

Downloads last month
1,485
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train my-ai-stack/Stack-3.0-Omni-Nexus

Spaces using my-ai-stack/Stack-3.0-Omni-Nexus 2

Collection including my-ai-stack/Stack-3.0-Omni-Nexus