What is the basic formula to get this to run?

#11
by ASmith171 - opened

I'm trying to run the canary-qwen-2.5b model using the NeMo framework and encountering an error. I've been unable to run the model locally on my 16GB GPU. Is this model intended to run on consumer hardware, or is it specifically optimized for data center GPUs? I'm getting an error when I try to run the transcribe_speech.py script with cuda=False, as it expects an integer. My goal is to use this model for a personal project for YouTube video transcription. Any guidance or an alternative script for local inference would be greatly appreciated.

I have come a long way to this point and am about to give up on canary-qwen-2.5b, but would rather not. I'm running Ubuntu 22.04.5 with an RTX 4080 and a Ryzen 9. I couldn't get it to run on the GPU and now cannot get it to run on the CPU.

I've tried versions with Docker and without. I've been targeting a local application but am considering cloud based services because of these challenges that I simply need to move on from. Does anyone have a recipe to make this work? (lol).

Thank you.

Hi @ASmith171 , Canary-Qwen requires a different script to run it as mentioned in the model card. Please try the following instead:

cd NeMo
python examples/speechlm2/salm_generate.py \
  pretrained_name=nvidia/canary-qwen-2.5b \
  inputs=input_manifest.json \
  output_manifest=generations.jsonl \
  batch_size=128 \
  user_prompt="Transcribe the following:"  # audio locator is added automatically at the end if not present

The model can easily run on your GPU.

You can also fork the HF space we have built and run it locally: https://huggingface.co/spaces/nvidia/canary-qwen-2.5b

Sign up or log in to comment