nexaml commited on
Commit
4e20324
·
verified ·
1 Parent(s): 5d2e39c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - multimodal
4
+ - NPU
5
+ - On-device
6
+ - Snapdragon PC
7
+ - Android
8
+ license: other
9
+ license_name: nexa-research
10
+ license_link: LICENSE
11
+ pipeline_tag: any-to-any
12
+ ---
13
+ <p align="center">
14
+ <img alt="omnineural" src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/zRUnoWmw43fl9hrXHg0pE.png">
15
+ </p>
16
+
17
+ # **OmniNeural** — World’s First NPU-aware Multimodal Model
18
+
19
+
20
+ ## **Overview**
21
+ **OmniNeural** is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands **text, images, and audio**, and runs across PCs, mobile devices, automobile, IoT, and robotics.
22
+
23
+ ## Demos
24
+
25
+ ### 📱 Mobile Phone NPU - Demo on Samsung S25 Ultra
26
+ The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running **natively on Snapdragon NPU** for long battery life and low latency.
27
+
28
+ <video controls width="720" preload="metadata"
29
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/MOBILE_50MB.mp4"
30
+ type="video/mp4"></video>
31
+
32
+ ---
33
+
34
+ ## ✨ PC NPU - Capabilities Highlights
35
+
36
+ <table>
37
+ <tr>
38
+ <td width="33%">
39
+ <video controls width="100%" preload="metadata"
40
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_demo_2_image.mov"></video>
41
+ <p align="center"><b>🖼️ Multi-Image Reasoning</b><br>Spot the difference across two images in multi-round dialogue.</p>
42
+ </td>
43
+
44
+ <td width="33%">
45
+ <video controls width="100%" preload="metadata"
46
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Agent.mov"></video>
47
+ <p align="center"><b>🤖 Image + Text → Function Call</b><br>Snap a poster, add a text instruction, and AI agent creates a calendar event.</p>
48
+ </td>
49
+
50
+ <td width="33%">
51
+ <video controls width="100%" preload="metadata"
52
+ src="https://huggingface.co/NexaAI/OmniNeural-4B/resolve/main/assets/PC_Demo_Audio.mov"></video>
53
+ <p align="center"><b>🎶 Multi-Audio Comparison</b><br>Tell the difference between two music clips locally.</p>
54
+ </td>
55
+ </tr>
56
+ </table>
57
+
58
+
59
+
60
+ ---
61
+
62
+ ## **Key Features**
63
+ - **Multimodal Intelligence** – Processes **text, image, and audio** in a unified model for richer reasoning and perception.
64
+ - **NPU-Optimized Architecture** – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — **20% faster than non-NPU-aware models** .
65
+ - **Hardware-Aware Attention** – Attention patterns tuned for NPU, lowering compute and memory demand .
66
+ - **Native Static Graph** – Supports variable-length multimodal inputs with stable, predictable latency .
67
+ - **Performance Gains** – **9× faster audio processing** and **3.5× faster image processing** on NPUs compared to baseline encoders .
68
+ - **Privacy-First Inference** – All computation stays local: private, offline-capable, and cost-efficient.
69
+
70
+ ---
71
+
72
+ ## **Performance / Benchmarks**
73
+ ### Human Evaluation (vs baselines)
74
+ - **Vision**: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
75
+ - **Audio**: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
76
+ - **Text**: Matches or outperforms leading multimodal baselines.
77
+
78
+
79
+ <p align="center">
80
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/vsrg43GxTOSAj7q_SI60o.png" width="1560" alt="Human eval chart" />
81
+ </p>
82
+
83
+ ### Nexa Attention Speedups
84
+ - **9× faster** audio encoding (vs Whisper encoder).
85
+ - **3.5× faster** image encoding (vs SigLIP encoder).
86
+
87
+
88
+ <p align="center">
89
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/1039SN5JBQkS04z4YnoIi.png" width="400" alt="Human eval chart" />
90
+ </p>
91
+
92
+ ---
93
+
94
+ ## **Architecture Overview**
95
+ OmniNeural’s design is tightly coupled with NPU hardware:
96
+ - **NPU-friendly ops** (ReLU > GELU/SILU).
97
+ - **Sparse + small tensor multiplications** for efficiency.
98
+ - **Convolutional layers** favored over linear for better NPU parallelization.
99
+ - **Hardware-aware attention** patterns to cut compute cost.
100
+ - **Static graph execution** for predictable latency.
101
+
102
+
103
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6618e0424dbef6bd3c72f89a/oINYbgXILJgTuKxKc1aO_.png)
104
+
105
+ ---
106
+
107
+ ## **Production Use Cases**
108
+
109
+ - **PC & Mobile** – On-device AI agents combine **voice, vision, and text** for natural, accurate responses.
110
+ - Examples: Summarize slides into an email (PC)*, *extract action items from chat (mobile).
111
+ - Benefits: Private, offline, battery-efficient.
112
+
113
+ - **Automotive** – In-car assistants handle **voice control, cabin safety, and environment awareness**.
114
+ - Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
115
+ - Benefits: Decisions run locally in milliseconds.
116
+
117
+ - **IoT & Robotics** – Multimodal sensing for **factories, AR/VR, drones, and robots**.
118
+ - Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
119
+ - Benefits: Works without network connectivity.
120
+
121
+ ---
122
+
123
+ ## How to use
124
+
125
+ Note this version is for mobile only (Android). See documentation for how to use:
126
+
127
+ [Quickstart](https://docs.nexa.ai/nexa-sdk-android/quickstart#run-your-first-model)
128
+
129
+ ---
130
+
131
+ ## Links & Community
132
+
133
+ [![Discord](https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white)](https://discord.com/invite/nexa-ai)
134
+
135
+ [![X (Twitter) Follow](https://img.shields.io/badge/Follow-@nexa_ai-111?logo=x&logoColor=white)](https://x.com/nexa_ai)
136
+
137
+ [![Website](https://img.shields.io/badge/Website-nexa.ai-0A84FF)](https://nexa.ai)
138
+
139
+ - **Issues / Feedback:** Use the **HF Discussions** tab or submit an issue in our discord or nexa-sdk github.
140
+ - **Roadmap & updates:** Follow us on X and Discord.
141
+
142
+ > If you want to see more **NPU-first, multimodal** releases on HF, please give our model a like ❤️.
143
+
144
+ ## Limitation
145
+ The current model is mainly optimized for English. We will optimize other language as the next step.
146
+
147
+ ---
148
+
149
+ ## **Citation**
150
+
151
+ ```bibtex
152
+ @misc{
153
+ title={OmniNeural: World’s First NPU-aware Multimodal Model},
154
+ author={Nexa AI},
155
+ year={2025},
156
+ url={https://huggingface.co/NexaAI/OmniNeural-4B},
157
+ }
158
+ ```