Anisora V3 Model Quantization with FP8
This repository provides a script for quantizing the Alibaba Wan model / AniSora V3 to FP8 (Float8) format. This can help reduce model size and potentially improve inference speed on hardware that supports FP8.
Features
- FP8 Quantization: Implements custom FP8 quantization logic, supporting E4M3 and E5M2 formats.
- Wan Model Integration: Designed to work with the
WanModelarchitecture, specifically targeting its attention and feed-forward layers. safetensorsExport: Saves the quantized model weights in thesafetensorsformat for efficient loading.- ComfyUI Support: Can be used with WanVideoWrapper (latest version) and ComfyUI native WanVideo nodes.
Setup
Please refer to WanVideo for building environment.
Run
Estimated 64GB memory (RAM) are required to run this code.
from wan_2_1_fp8_quantizer import FP8Quantizer, WanModel
model = WanModel.from_pretrained("Anisora_V3_1/V3.1")
quantizer = FP8Quantizer()
state_dict = model.state_dict()
quantized_state_dict = quantizer.apply_quantization(state_dict)
from safetensors.torch import save_file
save_file(quantized_state_dict,"Anisora_V3_1_fp8_e5m2.safetensors")
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for shinnpuru/Anisora_comfy_fp8_scaled
Base model
IndexTeam/Index-anisora