MERaLiON
/

MERaLiON-2-10B-ASR

@@ -20,7 +20,6 @@ library_name: transformers
 tags:
 - meralion
 - meralion-2
-- meralion-audiollm-v2
 ---
 # 🎉 MERaLiON-2: [MERaLiON-2-10B](https://huggingface.co/MERaLiON/MERaLiON-2-10B) | [MERaLiON-2-10B-ASR](https://huggingface.co/MERaLiON/MERaLiON-2-10B-ASR) | [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B)
@@ -40,8 +39,8 @@ tags:
 ## 📝 Model Description:
 MERaLiON-2 is a family of Speech-Text Large Language Models tailored for **Singapore’s multilingual and multicultural landscape**, as well as the wider **Southeast Asian region**.
-The 10B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-9b](https://huggingface.co/google/gemma-2-9b) text decoder.
-The 3B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-2b](https://huggingface.co/google/gemma-2-9b) text decoder.
 The model is finetuned on **120,000 hours of speech and audio data** across **6 diverse tasks**.
 The model supports long-form audio inputs of up to 300 seconds (5 minutes) and is specifically adapted to handle the linguistic nuances, accents, and dialects commonly found across Singapore and neighboring countries.
@@ -856,13 +855,13 @@ To maximize throughput for long-form audio-text interactions, we support inferen
 ## ⚠️ Disclaimer
-The current MERaLiON-AudioLLM-V2 has not been specifically aligned for safety and may generate content that is inappropriate, offensive, or harmful. Developers and users are responsible for performing their own safety fine-tuning and implementing necessary security measures. The authors shall not be held liable for any claims, damages, or other liabilities arising from the use of the released models, weights, or code.
 ### Compute and Infrastructure
-MERaLiON-AudioLLM-V2 was trained on the [**ASPIRE 2A+**](https://help.nscc.sg/aspire2aplus/about/) Supercomputer Cluster, provided by [**National Supercomputing Centre (NSCC)**](https://www.nscc.sg/), Singapore. ASPIRE 2A+ cluster provides multiple H100 nodes, with each compute node equipped with 8 Nvidia H100 GPUs, 2 TB of RAM, and 30 TB of locally attached NVMe storage. These nodes are interconnected via a rail-optimised, full fat-tree topology, utilising 400 Gb/s NDR InfiniBand cables. Additionally, the cluster incorporates a 2.5 PB SSD-based Lustre file system, linked to the H100 nodes through high-speed InfiniBand connections.
-With a global batch size of 768, we trained the current release of MERaLiON-AudioLLM-V2 for around 200k steps, which took around 2 days to complete using 16 nodes, 128 H100 GPUs.
 ## 📚 Citation

 tags:
 - meralion
 - meralion-2
 ---
 # 🎉 MERaLiON-2: [MERaLiON-2-10B](https://huggingface.co/MERaLiON/MERaLiON-2-10B) | [MERaLiON-2-10B-ASR](https://huggingface.co/MERaLiON/MERaLiON-2-10B-ASR) | [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B)
 ## 📝 Model Description:
 MERaLiON-2 is a family of Speech-Text Large Language Models tailored for **Singapore’s multilingual and multicultural landscape**, as well as the wider **Southeast Asian region**.
+The 10B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-9b-IT](https://huggingface.co/google/gemma-2-9b-it) text decoder.
+The 3B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-2b-IT](https://huggingface.co/google/gemma-2-2b-it) text decoder.
 The model is finetuned on **120,000 hours of speech and audio data** across **6 diverse tasks**.
 The model supports long-form audio inputs of up to 300 seconds (5 minutes) and is specifically adapted to handle the linguistic nuances, accents, and dialects commonly found across Singapore and neighboring countries.
 ## ⚠️ Disclaimer
+The current MERaLiON-2 has not been specifically aligned for safety and may generate content that is inappropriate, offensive, or harmful. Developers and users are responsible for performing their own safety fine-tuning and implementing necessary security measures. The authors shall not be held liable for any claims, damages, or other liabilities arising from the use of the released models, weights, or code.
 ### Compute and Infrastructure
+MERaLiON-2 was trained on the [**ASPIRE 2A+**](https://help.nscc.sg/aspire2aplus/about/) Supercomputer Cluster, provided by [**National Supercomputing Centre (NSCC)**](https://www.nscc.sg/), Singapore. ASPIRE 2A+ cluster provides multiple H100 nodes, with each compute node equipped with 8 Nvidia H100 GPUs, 2 TB of RAM, and 30 TB of locally attached NVMe storage. These nodes are interconnected via a rail-optimised, full fat-tree topology, utilising 400 Gb/s NDR InfiniBand cables. Additionally, the cluster incorporates a 2.5 PB SSD-based Lustre file system, linked to the H100 nodes through high-speed InfiniBand connections.
+With a global batch size of 768, we trained the current release of MERaLiON-2 for around 200k steps, which took around 2 days to complete using 16 nodes, 128 H100 GPUs.
 ## 📚 Citation