FreedomIntelligence
/

EchoX-8B

audio-text-to-audio-text

speech-understanding

Model card Files Files and versions

tzzte commited on Sep 8

Commit

35b11cd

·

verified ·

1 Parent(s): 0482e66

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: apache-2.0
----

+---
+language:
+- en
+tags:
+- audio-text-to-audio-text
+- speech-understanding
+- audio
+- chat
+license: apache-2.0
+datasets:
+- custom
+metrics:
+- wer
+- bleu
+- AIR-Bench
+---
+<div align="center">
+<h1>
+  EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
+</h1>
+</div>
+<p align="center">
+  <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp｜&nbsp<a href="https://arxiv.org/abs/XXXX.XXXX">📃 Paper</a>&nbsp｜&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">📼 Online Demo</a>&nbsp</font>
+</p>
+## Model Description
+EchoX is a Speech-to-Speech large language model that addresses the acoustic-semantic gap. By introducing **Echo Training**, EchoX integrates semantic and acoustic learning, mitigating the degradation of reasoning ability observed in existing speech-based LLMs. It is trained on only 10k hours of data while delivering state-of-the-art results in knowledge-based question answering and speech interaction tasks.
+### Key Features
+<div>
+  <ul>
+    <font size="3"><li>Mitigates Acoustic-Semantic Gap in Speech-to-Speech LLMs</li></font>
+    <font size="3"><li>Introduces Echo Training with a Novel Three-Stage Pipeline (S2T, T2C, Echo)</li></font>
+    <font size="3"><li>Trained on Only 10k Hours of Curated Data, Ensuring Efficiency</li></font>
+    <font size="3"><li>Achieves State-of-the-Art Performance in Knowledge-Based QA Benchmarks</li></font>
+    <font size="3"><li>Preserves Reasoning and Knowledge Abilities for Interactive Speech Tasks</li></font>
+  </ul>
+</div>
+## Usage
+Load the EchoX model and run inference with your audio files as shown in the <a href="https://github.com/FreedomIntelligence/EchoX">GitHub repository</a>.
+# <span>📖 Citation</span>
+```
+@inproceedings{zhang2026echox,
+  title={EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs},
+  author={Zhang, Yuhao and Du, Yuhao and Dai, Zhanchen and others},
+  booktitle={Proceedings of ICLR 2026},
+  year={2026},
+  url={https://arxiv.org/abs/XXXX.XXXX}
+}
+```