Lightricks/ltx-video-distilled · Ask for how to use FP8 version?

Hi,
I am currently using LTX Video to generate videos, which is the best model I have ever encountered.
But I have a problem. I have installed FP8 kernel, but still cannot run the model.

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install packaging wheel ninja setuptools
pip install --no-build-isolation git+https://github.com/Lightricks/LTX-Video-Q8-Kernels.git

The error I encountered is:

Moving models to cuda for inference (if not already there)...
Calling multi-scale pipeline (eff. HxW: 1024x768, Frames: 57 -> Padded: 57) on cuda
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 626, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 350, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 2235, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1746, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 917, in wrapper
    response = f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 917, in wrapper
    response = f(*args, **kwargs)
  File "/root/ltx-video-distilled/app.py", line 304, in generate
    result_images_tensor = multi_scale_pipeline_obj(**multi_scale_call_kwargs).images
  File "/root/ltx-video-distilled/ltx_video/pipelines/pipeline_ltx_video.py", line 1859, in __call__
    result = self.video_pipeline(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/ltx-video-distilled/ltx_video/pipelines/pipeline_ltx_video.py", line 1197, in __call__
    noise_pred = self.transformer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/ltx-video-distilled/ltx_video/models/transformers/transformer3d.py", line 478, in forward
    hidden_states = block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/ltx-video-distilled/ltx_video/models/transformers/attention.py", line 255, in forward
    attn_output = self.attn1(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/ltx-video-distilled/ltx_video/models/transformers/attention.py", line 710, in forward
    return self.processor(
  File "/root/ltx-video-distilled/ltx_video/models/transformers/attention.py", line 997, in __call__
    query = attn.to_q(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: self and mat2 must have the same dtype, but got BFloat16 and Float8_e4m3fn