ValueError with multi A100 GPUS

#28

by saireddy - opened Feb 22, 2024

Feb 22, 2024

anyone facing this issue with A100 multi gpus
ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}
I am using "auto" for device map, still hitting this issue

Renu11

Google org Jun 27, 2024

Hi @saireddy , Could you please have a look at this similar issue, seems duplicate? Please let us know if the issue still persists. Thank you.

saireddy changed discussion status to closed Jul 2, 2024

eshanc

Aug 19, 2025

hi @saireddy you might find Impulse AI (https://www.impulselabs.ai/) useful. we make it super easy to fine-tune and deploy open source models. hopefully you find it helpful! i know not relevant to your problem above but might be easier to use us to fine tune and deploy

docs: https://docs.impulselabs.ai/introduction
python sdk: https://pypi.org/project/impulse-api-sdk-python/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment