News

2026年03月24日
VoiceCoreを利用した提案「タクシー配車業務のAI音声対応」がGENIAC-PRIZE<NEDO懸賞金活用型プログラム>コンペ「国産基盤モデル等を活用した社会課題解決AIエージェント開発」の領域２「カスタマーサポートの生産性向上」部門において優勝されたそうです。
おめでとうございます！

The proposal "AI Voice Support for Taxi Dispatch Operations," utilizing VoiceCore, has won first place in the GENIAC-PRIZE competition "Development of AI Agents for Solving Social Issues Using Domestic Platform Models, etc."
Congratulations!

2026年03月18日
VoiceCoreを利用した提案がGENIACコンペ「カスタマーサポートの生産性向上」の決勝に進出したそうです。
おめでとうございます！

A proposal utilizing VoiceCore has advanced to the finals of the GENIAC competition, "Improving Customer Support Productivity."
Congratulations!

2025年11月23日
VoiceCoreがGENIACプロジェクト(経済産業省、NEDO)の国産基盤モデルリストに掲載されました。
これにより日本国内の法人、団体はVoiceCoreを使って1位の懸賞金が5000万円であるGENIAC-PRIZE(NEDO懸賞金活用型プログラム)に応募する事が可能になりました。
単独でもお申込みできますがVoiceCoreは商用サポートも承っておりますので、ご希望の方はこちらからご相談ください

VoiceCore has been listed on the National foundation model list of the GENIAC project (Ministry of Economy, Trade and Industry, NEDO).
This means that Japanese corporations and organizations can now use VoiceCore to apply for the GENIAC-PRIZE (NEDO Prize Fund Utilization Program), which offers a first place prize of 50 million yen.
You can apply for it alone, but we also offer commercial support for VoiceCore, so if you are interested, please contact here.

VoiceCore - 次世代日本語Voice AI Agent用モデル (VoiceCore - Next-Gen Japanese Voice AI Agent model)

VoiceCoreはAIが自然な日本語を発声可能にする商用利用可能なVoice AI Agentモデルです。
従来のTTS(Text to Speech:音声合成)ソフトウェアと異なる点は、文章を正確に発声する事は目的ではなく、AIが音声を使った意思疎通を人間とするために設計されており、笑い声などの非言語音声や感情表現が可能な事が特徴です。
VoiceCore is a commercially available Voice AI Agent model that enables AI to speak natural Japanese.
What makes it different from conventional TTS (Text to Speech) software is that its goal is not to accurately recite sentences, but rather it is designed to enable AI to communicate with humans using voice, and is characterized by its ability to use non-verbal speech(eg:laughting) and express emotions.

モデルの動かし方(How to run)

以下のページでVoiceCoreの音声合成品質をオンラインで確認する事ができます

VoiceCore_online

You can check the quality of VoiceCore's voice synthesis online at the following page:

VoiceCore online

以下のサンプルスクリプトを使うとGoolgeが提供するColabratoryで無料で動作確認する事ができます

Colab用サンプルスクリプト

You can use the following sample script to check the operation for free on Colaboratory provided by Google

Sample script for Colab

MacやCPU環境向けにggufフォーマット版も提供されています。
A gguf format version is also provided for Mac and CPU environments.

NvidiaやAMDの高性能GPUをお持ちの方向けに高速推論ツールであるvLLM用の8bit smoothquant版も用意されています。
For those with high-performance Nvidia or AMD GPUs, a 8bit smoothquant version for vLLM(High-speed inference tools) is also available.

更にモデルを圧縮し、高速化した4bit gptq版も用意されています。
A 4-bit gptq version is also available, which further compresses the model and speeds it up.

その他の使い方、声の指定方法や設計思想などの解説は「VoiceCoreの基本的な使い方 – 感情豊かなAIエージェント向け音声合成モデル」をご覧ください
For other usage, voice specification methods, and design philosophy, please see "Basic usage of VoiceCore - A speech synthesis model for emotive AI agents".

ディフォルト利用可能な音声の提供元とそのライセンス (Default voice providers and their licenses)

各声は商用可能ですが、提供者様により用途制限と連絡・クレジット表記義務が異なります。
Each voice can be used commercially, but usage restrictions and contact/credit obligations vary depending on the provider.

女性の声はプレビュー版の位置づけです。現在は高音域でノイズが乗ってしまう傾向があります。
The female voice is a preview version. Currently, there is a tendency for high-pitched voices to have noise.

声のタイプ	商用利用	使用不可コンテンツ	クレジット表記	提供元	ライセンス詳細/問い合わせ先
amitaro_female (明るい女の子)	⭕ 可能 (要事後報告)	❌ エロ・グロ ❌ 政治・宗教 ❌ ヘイト	✅ 必須「あみたろの声素材工房」	あみたろの声素材工房	ライセンス詳細 / 問い合わせ先
matsukaze_male (さわやかな男性)	⭕ 可能	制限なし	✅ 必須(CC-BY) 松風	松風	ライセンス詳細 / 問い合わせ先
naraku_female (落ち着いた女性)	⭕ 可能 (商用は要連絡)	❌ 反社・政治・宗教 ❌ 品位を損なう行為	個人利用：❌ 不要商用利用：✅ 必須「極楽唯」	VTuber 奈落ゆい	ライセンス詳細 / 問い合わせ先
shiguu_male (大人びた少年)	⭕ 可能 (商用は要連絡)	❌ 品位を損なう行為 ❌ 政治・宗教	✅ 必須「刻鳴時雨（CV:丸ころ）」	瓶詰め/丸ころ	ライセンス詳細 / 利用規約/ 問い合わせ先
sayoko_female (一般81歳女性)	⭕ 可能	❌ エロ・グロ	✅ 必須「Fusic サヨ子音声コーパス」	Fusic/bandad	ライセンス詳細 / 問い合わせ先
nekketsu_female (熱血ヒロイン)	⭕ 可能 (商用は要連絡)	❌ 悪用（詐欺的広告、フェイクニュース／動画、差別・中傷など）	✅ 任意「紅葉美兎及びAI生成音声である事」を明記	紅葉美兎	ライセンス詳細 / 問い合わせ先
dahara1_male (一般男性)	⭕ 可能	制限なし	✅ 任意 (apache2)	webbigdata	ライセンス詳細 / 問い合わせ先

商用利用時の連絡: naraku、shiguu, nekketsuは商用利用時に事前連絡が必要。amitaroは事後連絡可
再配布禁止: 素材としての再配布・販売は禁止
加工: 音声の加工・編集は可能
使用許諾: 上記の声提供者の皆さんには本モデルでの使用許可を直接頂いております。この許可はあらゆるAI/モデル/形態を想定した許可ではない事に留意してください

Voice Type	Commercial Use	Prohibited Content	Credit Required	Provider	License Details/Contact
amitaro_female (Cheerful girl)	⭕ Allowed (Post-use notification required)	❌ Adult/Gore ❌ Political/Religious ❌ Hate speech	✅ Required "Amitaro's Voice Material Studio"	Amitaro's Voice Material Studio	License Details / Contact
matsukaze_male (Refreshing male)	⭕ Allowed	No restrictions	✅ Required (CC-BY) Matsukaze	Matsukaze	License Details / Contact
naraku_female (Calm woman)	⭕ Allowed (Commercial use requires prior contact)	❌ Anti-social/Political/Religious ❌ Dignity-damaging acts	Personal use: ❌ Not required Commercial use: ✅ Required "gokuraku yui"	VTuber Naraku Yui	License Details / Contact
shiguu_male (Mature boy)	⭕ Allowed (Commercial use requires prior contact)	❌ Dignity-damaging acts ❌ Political/Religious	✅ Required "Tokina Shigure (CV: Marukoro)"	Binzume/Marukoro	License Details / Terms of Use / Contact
sayoko_female (81-year-old woman)	⭕ Allowed	❌ Adult/Gore	✅ Required "Fusic Sayoko Voice Corpus"	Fusic/bandad	License Details / Contact
nekketsu_female (Hot-blooded heroine)	⭕ Allowed (Commercial use requires prior contact)	❌ Malicious use (fraudulent advertising, fake news/videos, discrimination/defamation, etc.)	✅ Optional Must specify "Kureha Miu and AI-generated voice"	Kureha Miu	Kureha Miu / Contact
dahara1_male (General male)	⭕ Allowed	No restrictions	✅ Optional (apache2)	webbigdata	License Details / Contact

Commercial use contact: naraku, shiguu and nekketsu require prior contact for commercial use. amitaro allows post-use notification
Redistribution prohibited: Redistribution or sale as material is prohibited
Modification: Voice modification and editing are allowed
Permission to Use: All voice providers listed above have given us direct permission to use their voices in this model. Please note that this permission does not cover all AI/models/forms.

モデルの微調整方法(How to finetune)

以下のサンプルスクリプトを使うとGoolgeが提供するColabratoryで無料で微調整を行い、英語能力の向上や独自音声追加を体験する事ができます。

微調整サンプルスクリプト

Using the sample script below, you can finetune it for free with Colaboratory provided by Google, and experience improving model's English skills and adding your own voice.

finetune sample script

使用/参考にした研究/データセット ( Datasets and Research used/referenced )

以下のデータセット / コーパスを開発時に利用/参考にさせて頂いています。データセット、コーパスの提供者の皆様に感謝いたします。
The following datasets/corpora were used/referenced during development. We would like to express our gratitude to the providers of the datasets and corpora.

モデル詳細 / Model Detail

モデル愛称：webbigdata/VoiceCore
Model nickname: webbigdata/VoiceCore

ベースモデル: Orpheus TTS (Llamaアーキテクチャを利用しています)
Base Model: Orpheus TTS (which utilizes Llama architecture)

モデルライセンス: 解釈に応じて、LLAMA 3.2 COMMUNITY LICENSEまたはApache License, Version 2.0のいずれかを選択できます。これは、Orpheusが独自のカスタム音声トークンを出力し、Llama3.2の出力ではなく、変形的/派生的な著作物として解釈できるためです。念の為ですがどちらも商用利用を許諾しているライセンスです。
Model License: Depending on your interpretation, you can choose either the LLAMA 3.2 COMMUNITY LICENSE or the Apache License, Version 2.0. This is because Orpheus outputs its own custom voice tokens and can be interpreted as a transformative/derivative work, not the output of Llama 3.2. Just to be clear, both licenses allow commercial use.

本モデルは、canopylabs/orpheus-3b-0.1-pretrained に継続事前学習と事後学習を行ったモデルです。学術的な正式名称は論文発表時に決定される予定です。
This model is a model that has been subjected to continuous pre-training and post-training on canopylabs/orpheus-3b-0.1-pretrained. The official academic name will be decided at the time of publication of the paper.

技術仕様 / Technical Specifications

モデルパラメータ数 37億(3B)
bf16推論時の必要GPUメモリ目安約8GB
音声ファイルサンプリングレート 24khz
TensorRT LLM環境での実測値
RTX 4060ti(メモリ帯域幅:288GB/秒)
bf16 約40 tokens/sec
fp8 約65 tokens/sec
RTX 3090(メモリ帯域幅936GB/秒)
bf16 約100 tokens/sec
リアルタイム会話を実現するためには70 tokens/秒以上の性能が必要です
Number of model parameters: 3.7 billion (3B)
Estimated GPU memory required for bf16 inference: approx. 8GB
Audio file sampling rate: 24khz
Actual measurements in TensorRT LLM environment
RTX 4060ti (memory bandwidth: 288GB/sec)
bf16 approx. 40 tokens/sec
fp8 approx. 65 tokens/sec
RTX 3090 (memory bandwidth 936GB/sec)
bf16 approx. 100 tokens/sec
To achieve real-time conversation, a performance of 70 tokens/sec or more is required.

利用者アンケート / User Survey

私達はユーザーからの反響を非常に重視しています。
Googleフォームに感想や今後期待する方向性、気が付いた誤りの例、ディフォルトボイスへの採用希望などを是非とも記入してください。

We place great importance on user feedback.
Please fill out the Google form with your thoughts, your desired future direction, examples of errors you've noticed, and any requests you'd like to see included as default voices.

法人・ビジネスでのご利用について (For Business Users)

このモデルに関する商用サポート、カスタム開発、コンサルティング等のご相談は、以下の法人向けウェブサイトより承っております。
For commercial support, custom development, or professional consulting, please visit our corporate website.

残TODO

MCPで動かす方法の解説(how to MCP)
より様々なツールなどでの動かすための解説(more more documents)

謝辞 / Acknowledgment

全ての合成音声の研究者/愛好家/声データ提供者の皆様。彼らの研究成果/データ/熱意がなけなければ、このモデルは完成できなかったでしょう。直接使用しなかったデータ/知識などにも大いに影響/励ましを受けました。
To all researchers and enthusiasts of synthetic speech, Voice data provider. Without their research results, data, and enthusiasm, this model would not have been completed. I was also greatly influenced and encouraged by data and knowledge that I did not directly use.

meta-llama/Llama-3.2-3B-Instruct
canopylabs/orpheus-tts
hubertsiuzdak/snac_24khz
unslothai/unsloth for for providing a memory-efficient training method.
pytorch/torchtune for providing a variety of training methods.
Huggingface for storage.

Developer/開発

Developed by: dahara1@webbigdata
Model type: text audio generation
Language(s) (NLP): Japanese
model : webbigdata/VoiceCore

BibTeX:

@misc{dahara2025 VoiceCore,
  author       = {dahara1@webbigdata},
  title        = {VoiceCore - Next-Gen Japanese Voice AI Agent model},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/webbigdata/VoiceCore}},
  note         = {Accessed: 2025-07-18},
  abstract     = {This model is designed to enable AI to communicate with humans using voice, and is characterized by its ability to use non-verbal speech and express emotions.},
}

Downloads last month: 87

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for webbigdata/VoiceCore

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

canopylabs/orpheus-3b-0.1-pretrained

Finetuned

(87)

this model

Quantizations

3 models

Space using webbigdata/VoiceCore 1

Collection including webbigdata/VoiceCore

VoiceCore

Collection

VoiceCore - Next-Gen Japanese Voice AI Agent model • 4 items • Updated Aug 12, 2025 • 1