TitaNet-Large β GGUF
GGUF conversion of nvidia/speakerverification_en_titanet_large (CC-BY-4.0) for native C/C++ inference via CrispASR.
Model
| Detail | Value |
|---|---|
| Architecture | TitaNet-Large β depthwise separable Conv1D encoder + ASP decoder |
| Parameters | 23M |
| Embedding dim | 192 (L2-normalized) |
| EER | 0.66% on VoxCeleb1-O cleaned |
| Input | 16 kHz mono PCM |
| GGUF size | ~45 MB (F16 weights, F32 batch-norm) |
| License | CC-BY-4.0 |
Architecture
Preprocessor: 16kHz β 80-bin mel spectrogram (Hann window, n_fft=512, hop=160, win=400)
Encoder (Jasper-style):
Block 0 (prolog): DW-Conv(80, k=3) + PW-Conv(80β1024) + BN + SE + ReLU
Block 1: 3Γ DW-Sep-Conv(1024, k=7) + SE + residual + ReLU
Block 2: 3Γ DW-Sep-Conv(1024, k=11) + SE + residual + ReLU
Block 3: 3Γ DW-Sep-Conv(1024, k=15) + SE + residual + ReLU
Block 4 (epilog): DW-Conv(1024, k=1) + PW-Conv(1024β3072) + BN + SE + ReLU
Decoder:
ASP (Attentive Statistics Pooling): 3072 β 6144
BN + Linear: 6144 β 192
L2-normalize
Usage with CrispASR
Speaker enrollment
# Enroll a speaker from a reference audio clip
crispasr --enroll-speaker alice \
--speaker-db ./speakers \
-f alice_reference.wav
Speaker identification during transcription
# Transcribe with speaker identification
crispasr --backend parakeet \
--speaker-db ./speakers \
-f meeting.wav
# Output: (alice) Hello everyone...
Standalone embedding extraction
# Extract speaker embedding (test binary)
test-titanet titanet-large.gguf audio1.wav audio2.wav
# Prints cosine similarity matrix
Python
from crispasr import TitaNet, SpeakerDB
with TitaNet("titanet-large.gguf") as model:
emb = model.embed(pcm_16k_float32)
db = SpeakerDB("./speakers")
db.enroll("alice", emb)
name, score = db.match(emb, threshold=0.7)
Conversion
python models/convert-titanet-to-gguf.py \
--input nvidia/speakerverification_en_titanet_large \
--output titanet-large.gguf
Verification
Encoder + decoder parity with NeMo reference: cos = 0.999997 (mel-injected). End-to-end parity: cos = 0.917 (STFT float32 precision gap in mel front-end).
Citation
@inproceedings{koluguri2022titanet,
title={TitaNet: Neural Model for Speaker Representation with 1D Depth-wise Separable Convolutions and Global Context},
author={Koluguri, Nithin Rao and Park, Taejin and Ginsburg, Boris},
booktitle={ICASSP 2022},
year={2022}
}
- Downloads last month
- 142
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Model tree for cstr/titanet-large-GGUF
Base model
nvidia/speakerverification_en_titanet_large