Transcribe audio using NVIDIA Conformer and OpenAI Whisper
CLIP Model
phoneme and character level assessment