I've tried the q8 version and I've seen that when usingwebgpu device, result is different than wasmthis doesn't happens with fp32
webgpu
wasm
is it a known limitation? what's the problem?(I didn't tested other quantized versions)
· Sign up or log in to comment