Spaces:

tugrulkaya
/

audio-reasoning-explorer

Running

App Files Files Community

tugrulkaya commited on 21 days ago

Commit

8bb1b24

verified ·

1 Parent(s): d700258

Update README.md

Browse files

Files changed (1) hide show

README.md +86 -49

README.md CHANGED Viewed

@@ -1,4 +1,3 @@
----
 title: Audio Reasoning & Step-Audio-R1 Explorer
 emoji: 🎧
 colorFrom: purple
@@ -10,82 +9,120 @@ pinned: false
 license: cc-by-4.0
 short_description: Interactive guide to audio reasoning and Step-Audio-R1 model
 tags:
-- audio
-- reasoning
-- multimodal
-- step-audio-r1
-- LALM
-- chain-of-thought
-- education
----
-# 🎧 Audio Reasoning & Step-Audio-R1 Explorer
-An interactive educational space exploring the groundbreaking concepts behind **audio reasoning** and the **Step-Audio-R1** model.
-## 🎯 What is Audio Reasoning?
-Audio reasoning is an AI model's ability to perform **deliberate, multi-step thinking processes** over audio inputs. This goes far beyond simple speech recognition (ASR) or audio classification.
-**Step-Audio-R1** is the first model to successfully unlock reasoning capabilities in the audio domain, solving the "inverted scaling anomaly" that plagued previous audio language models.
-## 🚀 Features of This Space
-| Tab | Content |
-|-----|---------|
-| 🏠 **Introduction** | Overview of audio reasoning and key achievements |
-| 🧠 **Reasoning Types** | Interactive explorer for 5 types of audio reasoning |
-| 🚫 **The Problem** | Understanding the inverted scaling anomaly |
-| 🔬 **MGRD Solution** | How Modality-Grounded Reasoning Distillation works |
-| 🏗️ **Architecture** | Step-Audio-R1 model architecture breakdown |
-| 📊 **Benchmarks** | Performance comparisons and results |
-| 🎮 **Interactive Demo** | Simulated audio reasoning examples |
-| 🚀 **Applications** | Real-world use cases |
-| 📚 **Resources** | Papers, code, and references |
-## 🔬 Key Innovation: MGRD
-**Modality-Grounded Reasoning Distillation (MGRD)** is the core innovation that makes Step-Audio-R1 work:
-```
 Text-based reasoning → Filter textual surrogates → Keep acoustic-grounded chains → Native Audio Think
-```
-This iterative process teaches the model to reason over **actual acoustic features** instead of text transcripts.
-## 📊 Performance
 Step-Audio-R1 achieves:
-- ✅ **Surpasses Gemini 2.5 Pro** on comprehensive audio benchmarks
-- ✅ **Comparable to Gemini 3 Pro** (state-of-the-art)
-- ✅ **First successful test-time compute scaling** for audio
-## 📚 Resources
-- 📄 [Step-Audio-R1 Paper](https://arxiv.org/abs/2511.15848)
-- 💻 [GitHub Repository](https://github.com/stepfun-ai/Step-Audio-R1)
-- 🤗 [HuggingFace Collection](https://huggingface.co/collections/stepfun-ai/step-audio-r1)
-- 🎯 [Official Demo](https://stepaudiollm.github.io/step-audio-r1/)
-## 👤 Author
-**Mehmet Tuğrul Kaya**
-- 🐙 GitHub: [@mtkaya](https://github.com/mtkaya)
-- 🤗 HuggingFace: [tugrulkaya](https://huggingface.co/tugrulkaya)
-## 📝 Citation
-```bibtex
 @article{stepaudioR1,
   title={Step-Audio-R1 Technical Report},
   author={Tian, Fei and others},
   journal={arXiv preprint arXiv:2511.15848},
   year={2025}
 }
-```
----
 <p align="center">
-  <b>🎧 Sound Speaks, AI Listens and Thinks 🧠</b>
 </p>

 title: Audio Reasoning & Step-Audio-R1 Explorer
 emoji: 🎧
 colorFrom: purple
 license: cc-by-4.0
 short_description: Interactive guide to audio reasoning and Step-Audio-R1 model
 tags:
+audio
+reasoning
+multimodal
+step-audio-r1
+LALM
+chain-of-thought
+education
+🎧 Audio Reasoning & Step-Audio-R1 Explorer
+An interactive educational space exploring the groundbreaking concepts behind audio reasoning and the Step-Audio-R1 model.
+🎯 What is Audio Reasoning?
+Audio reasoning is an AI model's ability to perform deliberate, multi-step thinking processes over audio inputs. This goes far beyond simple speech recognition (ASR) or audio classification.
+Step-Audio-R1 is the first model to successfully unlock reasoning capabilities in the audio domain, solving the "inverted scaling anomaly" that plagued previous audio language models.
+🚀 Features of This Space
+Tab
+Content
+🏠 Introduction
+Overview of audio reasoning and key achievements
+🧠 Reasoning Types
+Interactive explorer for 5 types of audio reasoning
+🚫 The Problem
+Understanding the inverted scaling anomaly
+🔬 MGRD Solution
+How Modality-Grounded Reasoning Distillation works
+🏗️ Architecture
+Step-Audio-R1 model architecture breakdown
+📊 Benchmarks
+Performance comparisons and results
+🎮 Interactive Demo
+Simulated audio reasoning examples
+🚀 Applications
+Real-world use cases
+📚 Resources
+Papers, code, and references
+🔬 Key Innovation: MGRD
+Modality-Grounded Reasoning Distillation (MGRD) is the core innovation that makes Step-Audio-R1 work:
 Text-based reasoning → Filter textual surrogates → Keep acoustic-grounded chains → Native Audio Think
+This iterative process teaches the model to reason over actual acoustic features instead of text transcripts.
+📊 Performance
 Step-Audio-R1 achieves:
+✅ Surpasses Gemini 2.5 Pro on comprehensive audio benchmarks
+✅ Comparable to Gemini 3 Pro (state-of-the-art)
+✅ First successful test-time compute scaling for audio
+📚 Resources
+📄 Step-Audio-R1 Paper
+💻 GitHub Repository
+🤗 HuggingFace Collection
+🎯 Official Demo
+👤 Author
+Mehmet Tuğrul Kaya
+🐙 GitHub: @mtkaya
+🤗 HuggingFace: tugrulkaya
+📝 Citation
 @article{stepaudioR1,
   title={Step-Audio-R1 Technical Report},
   author={Tian, Fei and others},
   journal={arXiv preprint arXiv:2511.15848},
   year={2025}
 }
 <p align="center">
+<b>🎧 Sound Speaks, AI Listens and Thinks 🧠</b>
 </p>