--- license: mit --- # SenseVoice.cpp Jetson Nano Binaries **SenseVoice.cpp** is a high-performance, open-source C++ speech-to-text implementation aimed at edge devices. It leverages the [GGML](https://github.com/ggerganov/ggml) inference framework and supports multiple backends, including CUDA for GPU acceleration. This repository hosts prebuilt binaries optimized for **NVIDIA Jetson Nano**, so you can skip the build step and start transcribing right away. Original project: [https://github.com/lovemefan/SenseVoice.cpp](https://github.com/lovemefan/SenseVoice.cpp) --- ## ✨ Key Features * **Multi-language ASR**: Supports Chinese (Mandarin), Cantonese, English, Japanese, and Korean. * **Low latency**: Efficient inference with optional **flash-attn**. * **Quantization**: Q3, Q4, Q5, Q6, Q8 quantized models to reduce memory footprint. * **Flexible backends**: * CPU (all platforms) * CUDA (NVIDIA GPUs) * BLAS, Metal, Vulkan (upstream) * **Voice Activity Detection (VAD)**: Built-in silence-based VAD parameters. * **Inverse Text Normalization (ITN)**: Optionally output punctuation and formatted text. *For full feature details (streaming mode, extra backends), see the [upstream documentation](https://github.com/lovemefan/SenseVoice.cpp/blob/main/README.md).* --- ## πŸ“ Deliverable Directory Structure ```bash project-root/ β”œβ”€β”€ bin/ # Executables β”‚ β”œβ”€β”€ sense-voice-main # Main ASR program β”‚ β”œβ”€β”€ sense-voice-quantize # Model quantization utility β”‚ └── sense-voice-zcr-main # Zero-Crossing Rate detection example └── lib/ # Libraries β”œβ”€β”€ libcommon.a # Common static library β”œβ”€β”€ libggml-base.so # GGML base operations β”œβ”€β”€ libggml-cpu.so # GGML CPU support β”œβ”€β”€ libggml-cuda.so # GGML CUDA support β”œβ”€β”€ libggml.so # GGML core └── libsense-voice-core.a# SenseVoice core ``` * **bin/**: Standalone executables for Jetson Nano. * **lib/**: Static (`.a`) and shared (`.so`) libraries required at runtime. --- ## πŸš€ Quick Deployment Follow these steps to deploy and run on Ubuntu-based distributions (e.g., JetPack 4.5.1 on Jetson Nano): ### 1. Clone the Repo with Git LFS Support If you haven’t installed Git LFS yet, do so and initialize: ```bash # Install Git LFS curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt-get install git-lfs # Initialize in your repo git lfs install ``` Clone the repository: ```bash git clone https://huggingface.co//sensevoice-jetson-nano.git cd sensevoice-jetson-nano git lfs pull ``` ### 2. Track Large Binary Files with Git LFS Ensure large files (shared libraries) use LFS to avoid push errors: ```bash git lfs track "lib/*.so" git add .gitattributes ``` ### 3. Uploading New Binaries When you update or add new `.so` files in `lib/`, commit and push as usual: ```bash git add lib/*.so git commit -m "Add updated shared libraries via LFS" git push ``` ### 4. Make Binaries Executable ```bash chmod +x bin/* ``` ### 5. Install Shared Libraries System-wide ```bash sudo mkdir -p /usr/local/lib/sensevoice sudo cp lib/*.so /usr/local/lib/sensevoice/ echo "/usr/local/lib/sensevoice" | sudo tee /etc/ld.so.conf.d/sensevoice.conf sudo ldconfig ``` Alternatively, set `LD_LIBRARY_PATH` locally: ```bash export LD_LIBRARY_PATH="$PWD/lib:$LD_LIBRARY_PATH" ``` ### 6. Model Setup Download or convert a GGUF model (e.g., `sense-voice-small-q4_k.gguf`): ```bash # From Hugging Face git clone https://huggingface.co/lovemefan/sense-voice-gguf.git models ``` ### 7. Run Examples #### Speech-to-Text (non-streaming) ```bash bin/sense-voice-main \ -m models/sense-voice-small-q4_k.gguf \ -f input.wav \ -t 4 \ -l zh \ --use-itn \ --flash-attn ``` **Options**: * `-t N` / `--threads N`: Number of decode threads (default: 4) * `-l LANG` / `--language LANG`: `auto`, `zh`, `en`, `yue`, `ja`, `ko` * `--min_speech_duration_ms`, `--max_speech_duration_ms`: VAD thresholds * `--no-gpu` (`-ng`): Disable GPU * `--use-itn` (`-itn`): Enable inverse text normalization * `--flash-attn` (`-fa`): Enable Flash Attention decoder #### Quantization Utility ```bash bin/sense-voice-quantize \ --input models/sense-voice-small.bin \ --output models/sense-voice-small-q4_k.gguf \ --type q4_k ``` Supported quant types: `q3`, `q4_k`, `q4_0`, `q5_0`, `q6_k`, `q8`. #### Zero-Crossing Rate Demo ```bash bin/sense-voice-zcr-main input.wav ``` Follow these steps to deploy and run on Ubuntu-based distributions (e.g., JetPack 4.5.1 on Jetson Nano): ### 1. Clone the Repo ```bash git lfs install git clone https://huggingface.co//sensevoice-jetson-nano.git cd sensevoice-jetson-nano git pull ``` ### 2. Make Binaries Executable ```bash chmod +x bin/* ``` ### 3. Install Shared Libraries System-wide ```bash sudo mkdir -p /usr/local/lib/sensevoice sudo cp lib/*.so /usr/local/lib/sensevoice/ echo "/usr/local/lib/sensevoice" | sudo tee /etc/ld.so.conf.d/sensevoice.conf sudo ldconfig ``` Alternatively, set `LD_LIBRARY_PATH` locally: ```bash export LD_LIBRARY_PATH="$PWD/lib:$LD_LIBRARY_PATH" ``` ### 4. Model Setup Download or convert a GGUF model (e.g., `sense-voice-small-q4_k.gguf`): ```bash # From Hugging Face git clone https://huggingface.co/lovemefan/sense-voice-gguf.git models ``` ### 5. Run Examples #### Speech-to-Text (non-streaming) ```bash bin/sense-voice-main \ -m models/sense-voice-small-q4_k.gguf \ -f input.wav \ -t 4 \ -l zh \ --use-itn \ --flash-attn ``` **Options**: * `-t N` / `--threads N`: Number of decode threads (default: 4) * `-l LANG` / `--language LANG`: `auto`, `zh`, `en`, `yue`, `ja`, `ko` * `--min_speech_duration_ms`, `--max_speech_duration_ms`: VAD thresholds * `--no-gpu` (`-ng`): Disable GPU * `--use-itn` (`-itn`): Enable inverse text normalization * `--flash-attn` (`-fa`): Enable Flash Attention decoder #### Quantization Utility ```bash bin/sense-voice-quantize \ --input models/sense-voice-small.bin \ --output models/sense-voice-small-q4_k.gguf \ --type q4_k ``` Supported quant types: `q3`, `q4_k`, `q4_0`, `q5_0`, `q6_k`, `q8`. #### Zero-Crossing Rate Demo ```bash bin/sense-voice-zcr-main input.wav ``` *For streaming ASR or advanced examples, please refer to upstream's `sense-voice-stream` in the original repo.* --- ## πŸ›  Compatibility * **Hardware**: NVIDIA Jetson Nano * **OS**: Ubuntu 18.04 / JetPack 4.5.1 * **CUDA**: 10.2 * **C++**: C++17 --- ## πŸ“œ License MIT License β€” see [LICENSE](LICENSE) for details. For comprehensive build instructions, extra examples, and advanced backend support, visit the [official SenseVoice.cpp documentation](https://github.com/lovemefan/SenseVoice.cpp/blob/main/docs/build.md). Happy prototyping! πŸŽ™οΈπŸ’•