Eats up all RAM + 163GB Swap

#167

by LuvIsBadToTheBone - opened Jan 10, 2023

Discussion

LuvIsBadToTheBone

Jan 10, 2023

After the clone attempt failed i tried:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")

model = AutoModel.from_pretrained("bigscience/bloom")

This eats all Ram + Swap to 100% after the download has finished then get killed by ZSH
idk what to do anymore to get bloom running :(

borzunov

BigScience Workshop org Jan 18, 2023

You can try out Petals: https://colab.research.google.com/drive/1Ervk6HPNS6AYVr3xVdQnY5a-TjjmLCdQ?usp=sharing

Without Petals, you need 176+ GB GPU memory or RAM to run BLOOM at a decent speed.

LuvIsBadToTheBone

Jan 18, 2023

Well, even i try with 374GB Swap, ZSH still kills it becus it occupies all memory with the above script.

ybelkada

BigScience Workshop org Jan 19, 2023

•

edited Jan 19, 2023

We should maybe add a git tag (let's term it as "pytorch_only") pointing before the safetensors commit: 4d8e28c67403974b0f17a4ac5992e4ba0b0dbb6f but not sure if this will help - cc @julien-c @TimeRobber (maybe the safetensors weights will be still downloaded to the cache?)
Then you'll be able to load the model with:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom")

model = AutoModel.from_pretrained("bigscience/bloom", revision="pytorch_only")

TimeRobber

BigScience Workshop org Jan 19, 2023

Hum you can use huggingface_hub to download specific files (which I think from_pretrained already does). I think the issue is that the from_pretrained also loads in memory, so I think you need to just set meta as the device or offload it to disk using accelerate.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment