百川大模型本地量化部署的问题
#34
by
						
Jason123321123
	
							
						- opened
							
					
raise ValueError(
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
                        device_map to from_pretrained. Check
                        https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
                        for more details.
部署后运行便报这个错误,错误的核心在于模型尝试加载到GPU上,但是因为GPU的内存不足,一些模块被分配到了CPU或磁盘上。这通常发生在尝试加载一个非常大的模型,而GPU没有足够的RAM来完全容纳模型时。
