soham97 commited on
Commit
a04bbe7
·
1 Parent(s): 1cf68be
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -12,8 +12,8 @@ tags:
12
  - zero-shot
13
  - audio-text
14
  ---
15
- # Mellow
16
- [[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
17
 
18
  Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
19
 
@@ -96,5 +96,13 @@ With Mellow, we aim to showcase that small audio-language models can engage in r
96
 
97
  ## Citation
98
  ```
99
-
 
 
 
 
 
 
 
 
100
  ```
 
12
  - zero-shot
13
  - audio-text
14
  ---
15
+ # Mellow: a small audio language model for reasoning
16
+ [[`Paper`](https://arxiv.org/abs/2503.08540)] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
17
 
18
  Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
19
 
 
96
 
97
  ## Citation
98
  ```
99
+ @misc{mellow,
100
+ title={Mellow: a small audio language model for reasoning},
101
+ author={Soham Deshmukh and Satvik Dixit and Rita Singh and Bhiksha Raj},
102
+ year={2025},
103
+ eprint={2503.08540},
104
+ archivePrefix={arXiv},
105
+ primaryClass={cs.SD},
106
+ url={https://arxiv.org/abs/2503.08540},
107
+ }
108
  ```