FireRedVAD Stream-VAD (ONNX)
ONNX export of the Stream-VAD model from FireRedTeam/FireRedVAD for real-time streaming voice activity detection.
Model Details
| Property | Value |
|---|---|
| Architecture | DFSMN (Deep Feedforward Sequential Memory Network) |
| FSMN Blocks | 8 |
| Projection Dim | 128 |
| Initial Cache Length | 10 |
| Input Features | 80-dim log-mel filterbank (fbank) |
| Frame Length | 25ms (400 samples @ 16kHz) |
| Frame Shift | 10ms (160 samples @ 16kHz) |
| Output | Speech probability per frame (sigmoid, 0β1) |
| ONNX Opset | 17 |
Files
firered_vad.onnxβ The DFSMN model (2.3 MB)cmvn.jsonβ CMVN normalization parameters (80-dim means + inverse stddevs)model_meta.jsonβ Architecture metadata for runtime initialization
Input/Output Specification
Inputs
| Name | Shape | Description |
|---|---|---|
feat |
[1, num_frames, 80] |
CMVN-normalized fbank features |
cache_0βcache_7 |
[1, 128, cache_len] |
Per-block FSMN streaming caches |
Outputs
| Name | Shape | Description |
|---|---|---|
probs |
[1, num_frames, 1] |
Speech probability per frame |
new_cache_0βnew_cache_7 |
[1, 128, new_cache_len] |
Updated caches (carry to next call) |
Streaming Usage
Initialize caches as zeros with cache_len=10:
caches = {f"cache_{i}": np.zeros((1, 128, 10), dtype=np.float32) for i in range(8)}
For each audio chunk:
- Extract 80-dim fbank features (25ms window, 10ms shift, 16kHz)
- Apply CMVN normalization:
(feature - mean) * inv_stddev - Run inference with
feat+ current caches - Carry
new_cache_*outputs to the next call - Speech probability > 0.5 indicates speech
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("firered_vad.onnx")
caches = {f"cache_{i}": np.zeros((1, 128, 10), dtype=np.float32) for i in range(8)}
# For each chunk of fbank features:
feat = extract_fbank(audio_chunk) # [1, num_frames, 80]
feat = apply_cmvn(feat, cmvn) # normalize
inputs = {"feat": feat, **caches}
outputs = sess.run(None, inputs)
probs = outputs[0] # [1, num_frames, 1]
for i in range(8):
caches[f"cache_{i}"] = outputs[i + 1]
Export Script
This model was exported using scripts/export_firered_vad.py from the second-brain project. The script:
- Downloads the official PyTorch weights from
FireRedTeam/FireRedVAD - Wraps the model with flattened cache I/O for ONNX compatibility
- Exports with dynamic axes for variable-length streaming input
- Converts Kaldi CMVN ark β JSON
License
Apache 2.0, following the original FireRedTeam/FireRedVAD license.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for Mazino0/fire-red-streaming-vad-onnx
Base model
FireRedTeam/FireRedVAD