YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Background: USM Encoder extracted from Gemma 3n model
Gemma3n is able to process audio inputs. That is achieved by encoding audio with an Universal Speech Encoder (USM, https://arxiv.org/abs/2303.01037). This encoder operates at 6.5 frames per second, and each frame is a continuous embedding with a dimensionality of 1536.
This repo
To facilitate experimentaion with this encoder, I've extracted weights of the audio encoder from the entire Gemma3n model, so that this encoder can be used separately. The weights are comming from this HF Gemma3n repo.
Some imports:
import torch
from transformers.models.gemma3n.feature_extraction_gemma3n import Gemma3nAudioFeatureExtractor
import sphn
import librosa
from transformers import Gemma3nAudioConfig, Gemma3nAudioEncoder
from huggingface_hub import hf_hub_download
Loading the model:
configuration = Gemma3nAudioConfig()
repo_id = "n0mad-0/gemma3n-usm-rip"
filename = "usm.th"
model_path = hf_hub_download(repo_id=repo_id, filename=filename)
encoder = Gemma3nAudioEncoder(configuration).cuda()
encoder.load_state_dict(
torch.load(model_path, weights_only=True, map_location='cuda')
)
Now we load the audio, build and initialize feature extractor (prepares mel spectrograms), and the USM encoder:
feature_extractor = Gemma3nAudioFeatureExtractor() # operates on 30s chunks, expects 16_000 sampling rate
audio, sample_rate = sphn.read("bria.mp3")
audio = librosa.resample(audio, orig_sr=sample_rate, target_sr=feature_extractor.sampling_rate)
audio = audio[:, : 10 * feature_extractor.sampling_rate]
features = feature_extractor(audio)
audio_mel = torch.stack(
[torch.from_numpy(x) for x in features['input_features']]
).cuda()
audio_mel_mask = torch.stack(
[torch.from_numpy(x) for x in features['input_features_mask']]
).cuda()
emb, mask = encoder(audio_mel, ~audio_mel_mask) # seems I need to invert the mask?
emb.shape # torch.Size([1, 63, 1536])
license: gemma
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support