CublasLt error when running in 8bit

#4
by TahirC - opened

Using same code as provided in README.md in kaggle with P100 GPU
model is loading properly in VRAM but it fails when we call

question = 'Hello, who are you?'
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
print(f'User: {question}\nAssistant: {response}')

RuntimeError: cublasLt ran into an error!
shapeA=torch.Size([4096, 4096]), shapeB=torch.Size([51, 4096]), shapeC=(51, 4096)
(lda, ldb, ldc)=(c_int(4096), c_int(4096), c_int(4096))
(m, n, k)=(c_int(4096), c_int(51), c_int(4096))

Sign up or log in to comment