update
Browse files
README.md
CHANGED
@@ -12,8 +12,8 @@ tags:
|
|
12 |
- zero-shot
|
13 |
- audio-text
|
14 |
---
|
15 |
-
# Mellow
|
16 |
-
[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
|
17 |
|
18 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
19 |
|
@@ -96,5 +96,13 @@ With Mellow, we aim to showcase that small audio-language models can engage in r
|
|
96 |
|
97 |
## Citation
|
98 |
```
|
99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 |
```
|
|
|
12 |
- zero-shot
|
13 |
- audio-text
|
14 |
---
|
15 |
+
# Mellow: a small audio language model for reasoning
|
16 |
+
[[`Paper`](https://arxiv.org/abs/2503.08540)] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
|
17 |
|
18 |
Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
|
19 |
|
|
|
96 |
|
97 |
## Citation
|
98 |
```
|
99 |
+
@misc{mellow,
|
100 |
+
title={Mellow: a small audio language model for reasoning},
|
101 |
+
author={Soham Deshmukh and Satvik Dixit and Rita Singh and Bhiksha Raj},
|
102 |
+
year={2025},
|
103 |
+
eprint={2503.08540},
|
104 |
+
archivePrefix={arXiv},
|
105 |
+
primaryClass={cs.SD},
|
106 |
+
url={https://arxiv.org/abs/2503.08540},
|
107 |
+
}
|
108 |
```
|