timit-asr/timit_asr
Updated • 797 • 27
Fine-tuned English IPA phone-recognition model initialized from
utter-project/mHuBERT-147 and trained with a compact linear CTC head.
This repository contains the full fine-tuned model:
94.4M backbone parameters + 35k linear-head parametersTraining setup:
utter-project/mHuBERT-1474 encoder layers fine-tunedValidation results from the fine-tuning run:
PER = 0.1012PER = 0.2082Notes:
istomin9192/mHuBERT-147-ipa-head,
with one extra CTC blank symbol at the last output index.Minimal loading example:
import json
import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel
repo_id = "istomin9192/mHuBERT-147-ipa-linear-ctc-ft"
feature_extractor = AutoFeatureExtractor.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
model.eval()
with open("ipa_map.json", "r", encoding="utf-8") as f:
id2phone = {int(k): v for k, v in json.load(f)["id2phone"].items()}
wav, sr = librosa.load(wav_file, sr=16000, mono=True)
inputs = feature_extractor(wav, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits[0]
pred_ids = logits.argmax(dim=-1).tolist()
blank_id = model.config.architecture["blank_id"]
phones = []
prev = blank_id
for pid in pred_ids:
if pid != blank_id and pid != prev:
phones.append(id2phone[pid])
prev = pid
print(phones)
Base model
utter-project/mHuBERT-147