Anisora V3 Model Quantization with FP8

This repository provides a script for quantizing the Alibaba Wan model / AniSora V3 to FP8 (Float8) format. This can help reduce model size and potentially improve inference speed on hardware that supports FP8.

Features

  • FP8 Quantization: Implements custom FP8 quantization logic, supporting E4M3 and E5M2 formats.
  • Wan Model Integration: Designed to work with the WanModel architecture, specifically targeting its attention and feed-forward layers.
  • safetensors Export: Saves the quantized model weights in the safetensors format for efficient loading.
  • ComfyUI Support: Can be used with WanVideoWrapper (latest version) and ComfyUI native WanVideo nodes.

Setup

Please refer to WanVideo for building environment.

Run

Estimated 64GB memory (RAM) are required to run this code.

from wan_2_1_fp8_quantizer import FP8Quantizer, WanModel
model = WanModel.from_pretrained("Anisora_V3_1/V3.1")

quantizer = FP8Quantizer()
state_dict = model.state_dict()
quantized_state_dict = quantizer.apply_quantization(state_dict)

from safetensors.torch import save_file
save_file(quantized_state_dict,"Anisora_V3_1_fp8_e5m2.safetensors")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shinnpuru/Anisora_comfy_fp8_scaled

Finetuned
(3)
this model