CublasLt error when running in 8bit
#4
by
						
TahirC
	
							
						- opened
							
					
Using same code as provided in README.md in kaggle with P100 GPU
model is loading properly in VRAM but it fails when we call
question = 'Hello, who are you?'
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')
RuntimeError: cublasLt ran into an error!
    shapeA=torch.Size([4096, 4096]), shapeB=torch.Size([51, 4096]), shapeC=(51, 4096)
    (lda, ldb, ldc)=(c_int(4096), c_int(4096), c_int(4096))
    (m, n, k)=(c_int(4096), c_int(51), c_int(4096))
