tzzte commited on
Commit
35b11cd
·
verified ·
1 Parent(s): 0482e66

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - audio-text-to-audio-text
6
+ - speech-understanding
7
+ - audio
8
+ - chat
9
+ license: apache-2.0
10
+ datasets:
11
+ - custom
12
+ metrics:
13
+ - wer
14
+ - bleu
15
+ - AIR-Bench
16
+ ---
17
+ <div align="center">
18
+ <h1>
19
+ EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
20
+ </h1>
21
+ </div>
22
+
23
+ <p align="center">
24
+ <font size="3"><a href="https://github.com/FreedomIntelligence/EchoX">🐈‍⬛ Github</a>&nbsp|&nbsp<a href="https://arxiv.org/abs/XXXX.XXXX">📃 Paper</a>&nbsp|&nbsp<a href="https://huggingface.co/spaces/FreedomIntelligence/EchoX">📼 Online Demo</a>&nbsp</font>
25
+ </p>
26
+
27
+ ## Model Description
28
+ EchoX is a Speech-to-Speech large language model that addresses the acoustic-semantic gap. By introducing **Echo Training**, EchoX integrates semantic and acoustic learning, mitigating the degradation of reasoning ability observed in existing speech-based LLMs. It is trained on only 10k hours of data while delivering state-of-the-art results in knowledge-based question answering and speech interaction tasks.
29
+
30
+ ### Key Features
31
+ <div>
32
+ <ul>
33
+ <font size="3"><li>Mitigates Acoustic-Semantic Gap in Speech-to-Speech LLMs</li></font>
34
+ <font size="3"><li>Introduces Echo Training with a Novel Three-Stage Pipeline (S2T, T2C, Echo)</li></font>
35
+ <font size="3"><li>Trained on Only 10k Hours of Curated Data, Ensuring Efficiency</li></font>
36
+ <font size="3"><li>Achieves State-of-the-Art Performance in Knowledge-Based QA Benchmarks</li></font>
37
+ <font size="3"><li>Preserves Reasoning and Knowledge Abilities for Interactive Speech Tasks</li></font>
38
+ </ul>
39
+ </div>
40
+
41
+ ## Usage
42
+ Load the EchoX model and run inference with your audio files as shown in the <a href="https://github.com/FreedomIntelligence/EchoX">GitHub repository</a>.
43
+
44
+ # <span>📖 Citation</span>
45
+ ```
46
+ @inproceedings{zhang2026echox,
47
+ title={EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs},
48
+ author={Zhang, Yuhao and Du, Yuhao and Dai, Zhanchen and others},
49
+ booktitle={Proceedings of ICLR 2026},
50
+ year={2026},
51
+ url={https://arxiv.org/abs/XXXX.XXXX}
52
+ }
53
+ ```