Automatic Speech Recognition
Transformers
Safetensors
meralion2
meralion
meralion-2
custom_code
wz258 commited on
Commit
cdd2358
·
verified ·
1 Parent(s): 894113d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -20,7 +20,6 @@ library_name: transformers
20
  tags:
21
  - meralion
22
  - meralion-2
23
- - meralion-audiollm-v2
24
  ---
25
  # 🎉 MERaLiON-2: [MERaLiON-2-10B](https://huggingface.co/MERaLiON/MERaLiON-2-10B) | [MERaLiON-2-10B-ASR](https://huggingface.co/MERaLiON/MERaLiON-2-10B-ASR) | [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B)
26
 
@@ -40,8 +39,8 @@ tags:
40
  ## 📝 Model Description:
41
 
42
  MERaLiON-2 is a family of Speech-Text Large Language Models tailored for **Singapore’s multilingual and multicultural landscape**, as well as the wider **Southeast Asian region**.
43
- The 10B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-9b](https://huggingface.co/google/gemma-2-9b) text decoder.
44
- The 3B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-2b](https://huggingface.co/google/gemma-2-9b) text decoder.
45
  The model is finetuned on **120,000 hours of speech and audio data** across **6 diverse tasks**.
46
  The model supports long-form audio inputs of up to 300 seconds (5 minutes) and is specifically adapted to handle the linguistic nuances, accents, and dialects commonly found across Singapore and neighboring countries.
47
 
@@ -856,13 +855,13 @@ To maximize throughput for long-form audio-text interactions, we support inferen
856
 
857
  ## ⚠️ Disclaimer
858
 
859
- The current MERaLiON-AudioLLM-V2 has not been specifically aligned for safety and may generate content that is inappropriate, offensive, or harmful. Developers and users are responsible for performing their own safety fine-tuning and implementing necessary security measures. The authors shall not be held liable for any claims, damages, or other liabilities arising from the use of the released models, weights, or code.
860
 
861
  ### Compute and Infrastructure
862
 
863
- MERaLiON-AudioLLM-V2 was trained on the [**ASPIRE 2A+**](https://help.nscc.sg/aspire2aplus/about/) Supercomputer Cluster, provided by [**National Supercomputing Centre (NSCC)**](https://www.nscc.sg/), Singapore. ASPIRE 2A+ cluster provides multiple H100 nodes, with each compute node equipped with 8 Nvidia H100 GPUs, 2 TB of RAM, and 30 TB of locally attached NVMe storage. These nodes are interconnected via a rail-optimised, full fat-tree topology, utilising 400 Gb/s NDR InfiniBand cables. Additionally, the cluster incorporates a 2.5 PB SSD-based Lustre file system, linked to the H100 nodes through high-speed InfiniBand connections.
864
 
865
- With a global batch size of 768, we trained the current release of MERaLiON-AudioLLM-V2 for around 200k steps, which took around 2 days to complete using 16 nodes, 128 H100 GPUs.
866
 
867
  ## 📚 Citation
868
 
 
20
  tags:
21
  - meralion
22
  - meralion-2
 
23
  ---
24
  # 🎉 MERaLiON-2: [MERaLiON-2-10B](https://huggingface.co/MERaLiON/MERaLiON-2-10B) | [MERaLiON-2-10B-ASR](https://huggingface.co/MERaLiON/MERaLiON-2-10B-ASR) | [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B)
25
 
 
39
  ## 📝 Model Description:
40
 
41
  MERaLiON-2 is a family of Speech-Text Large Language Models tailored for **Singapore’s multilingual and multicultural landscape**, as well as the wider **Southeast Asian region**.
42
+ The 10B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-9b-IT](https://huggingface.co/google/gemma-2-9b-it) text decoder.
43
+ The 3B model integrates a localized [Whisper-Large-V3](https://huggingface.co/openai/whisper-large-v3) speech encoder with the [Gemma2-2b-IT](https://huggingface.co/google/gemma-2-2b-it) text decoder.
44
  The model is finetuned on **120,000 hours of speech and audio data** across **6 diverse tasks**.
45
  The model supports long-form audio inputs of up to 300 seconds (5 minutes) and is specifically adapted to handle the linguistic nuances, accents, and dialects commonly found across Singapore and neighboring countries.
46
 
 
855
 
856
  ## ⚠️ Disclaimer
857
 
858
+ The current MERaLiON-2 has not been specifically aligned for safety and may generate content that is inappropriate, offensive, or harmful. Developers and users are responsible for performing their own safety fine-tuning and implementing necessary security measures. The authors shall not be held liable for any claims, damages, or other liabilities arising from the use of the released models, weights, or code.
859
 
860
  ### Compute and Infrastructure
861
 
862
+ MERaLiON-2 was trained on the [**ASPIRE 2A+**](https://help.nscc.sg/aspire2aplus/about/) Supercomputer Cluster, provided by [**National Supercomputing Centre (NSCC)**](https://www.nscc.sg/), Singapore. ASPIRE 2A+ cluster provides multiple H100 nodes, with each compute node equipped with 8 Nvidia H100 GPUs, 2 TB of RAM, and 30 TB of locally attached NVMe storage. These nodes are interconnected via a rail-optimised, full fat-tree topology, utilising 400 Gb/s NDR InfiniBand cables. Additionally, the cluster incorporates a 2.5 PB SSD-based Lustre file system, linked to the H100 nodes through high-speed InfiniBand connections.
863
 
864
+ With a global batch size of 768, we trained the current release of MERaLiON-2 for around 200k steps, which took around 2 days to complete using 16 nodes, 128 H100 GPUs.
865
 
866
  ## 📚 Citation
867