kontext-gguf / README.md

Update README.md

006b72f verified 4 months ago

8.13 kB

	---
	license: other
	license_name: flux-1-dev-non-commercial-license
	license_link: >-
	https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/LICENSE.md
	language:
	- en
	base_model:
	- black-forest-labs/FLUX.1-Kontext-dev
	pipeline_tag: image-to-image
	tags:
	- gguf-node
	- gguf-connector
	widget:
	- text: the anime girl with massive fennec ears is wearing cargo pants while sitting on a log in the woods biting into a sandwitch beside a beautiful alpine lake
	output:
	url: samples\ComfyUI_00001_.png
	- src: samples\fennec_girl_sing.png
	prompt: the anime girl with massive fennec ears is wearing cargo pants while sitting on a log in the woods biting into a sandwitch beside a beautiful alpine lake
	output:
	url: samples\ComfyUI_00001_.png
	- text: the anime girl with massive fennec ears is wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
	output:
	url: samples\ComfyUI_00002_.png
	- src: samples\fennec_girl_sing.png
	prompt: the anime girl with massive fennec ears is wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
	output:
	url: samples\ComfyUI_00002_.png
	- text: add a hat to the pig
	output:
	url: samples\hat.webp
	- src: samples\pig.png
	prompt: add a hat to the pig
	output:
	url: samples\hat.webp
	---
	# gguf quantized version of kontext
	- drag kontext to > `./ComfyUI/models/diffusion_models`
	- drag clip-l, t5xxl to > `./ComfyUI/models/text_encoders`
	- drag pig to > `./ComfyUI/models/vae`

	![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext.png)

	<Gallery />

	- don't need safetensors anymore; all gguf (model + encoder + vae)
	- full set gguf works on gguf-node (see the last item from reference at the very end)
	- get more t5xxl gguf encoder either [here](https://huggingface.co/calcuis/pig-encoder/tree/main) or [here](https://huggingface.co/chatpig/t5-v1_1-xxl-encoder-fp32-gguf/tree/main)

	![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-t2i.png)

	## extra: scaled safetensors (alternative 1)
	- get all-in-one checkpoint [here](https://huggingface.co/convertor/kontext-ckpt-fp8/blob/main/checkpoints/flux1-knotext-dev_fp8_e4m3fn.safetensors) (model, clips and vae embedded)
	![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-ckpt.png)
	- another option: get multi matrix scaled fp8 from comfyui [here](https://huggingface.co/Comfy-Org/flux1-kontext-dev_ComfyUI/blob/main/split_files/diffusion_models/flux1-dev-kontext_fp8_scaled.safetensors) or e4m3fn fp8 [here](https://huggingface.co/convertor/kontext-ckpt-fp8/blob/main/diffusion_models/flux1-dev-kontext_fp8_e4m3fn.safetensors) with seperate scaled version [l-clip](https://huggingface.co/chatpig/encoder/blob/main/clip_l_fp8_e4m3fn.safetensors), [t5xxl](https://huggingface.co/chatpig/encoder/blob/main/t5xxl_fp8_e4m3fn.safetensors) and [vae](https://huggingface.co/connector/pig-1k/blob/main/vae/pig_flux_vae_fp16.safetensors)

	## run it with diffusers🧨 (alternative 2)
	- might need the most updated diffusers (git version) for `FluxKontextPipeline` to work; upgrade your diffusers with:
	```
	pip install git+https://github.com/huggingface/diffusers.git
	```

	- see example inference below:
	```py
	import torch
	from transformers import T5EncoderModel
	from diffusers import FluxKontextPipeline
	from diffusers.utils import load_image

	text_encoder = T5EncoderModel.from_pretrained(
	"calcuis/kontext-gguf",
	gguf_file="t5xxl_fp16-q4_0.gguf",
	torch_dtype=torch.bfloat16,
	)

	pipe = FluxKontextPipeline.from_pretrained(
	"calcuis/kontext-gguf",
	text_encoder_2=text_encoder,
	torch_dtype=torch.bfloat16
	).to("cuda")

	input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

	image = pipe(
	image=input_image,
	prompt="Add a hat to the cat",
	guidance_scale=2.5
	).images[0]
	image.save("output.png")
	```

	- tip: if your machine doesn't has enough vram, would suggest running it with gguf-node via comfyui (plan a), otherwise you might expect to wait very long while falling to a slow mode; this is always a winner takes all game

	## run it with gguf-connector (alternative 3)
	- simply execute the command below in console/terminal
	```
	ggc k2
	```

	![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k2.png)
	- note: during the first time launch, it will pull the required model file(s) from this repo to local cache automatically; then opt to run it entirely offline; i.e., from local URL: http://127.0.0.1:7860 with lazy webui

	![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1a.png)
	- with bot lora embedded version
	```
	ggc k1
	```

	![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1b.png)
	- new plushie style
	![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1.png)

	## additional chapter for lora conversion via gguf-connector
	- convert lora from base to unet format, i.e.,[plushie](https://huggingface.co/fal/Plushie-Kontext-Dev-LoRA/blob/main/plushie-kontext-dev-lora.safetensors), then it can be used in comfyui as well
	```
	ggc la
	```

	![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-lora.png)
	- able to swap the lora back (from unet to base; auto-detection logic applied), then it can be used for inference again
	```
	ggc la
	```

	![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1d.png)

	### update
	- [clip-l-v2](https://huggingface.co/calcuis/pig-encoder/blob/main/clip_l_v2_fp32-f16.gguf): missing tensor `text_projection.weight` added
	- kontext-v2: `s-quant` and `k-quant`; except single and double blocks, all in `f32` status
	- pros: load faster (as no dequant needed for those tensors); and
	1) avoid key breaking issue, since some inference engines only dequant blocks;
	2) compatible for non-cuda machines, as most of them cannot run `bf16` tensors
	- cons: little bit larger in file size
	- kontext-v3: `i-quant` attempt (upgrade your node to the latest version for full quant support)
	- kontext-v4: `t-quant`; runnable (extramely fast); for speed test/experimental purposes

	\|rank\|quant\|s/it\|loading speed\|
	\|----\|--------\|---------\|----------------\|
	\| 1 \| q2_k \| 6.40±.7 \|🐖💨💨💨💨💨💨
	\| 2 \| q4_0 \| 8.58±.5 \|🐖🐖💨💨💨💨💨
	\| 3 \| q4_1 \| 9.12±.5 \|🐖🐖🐖💨💨💨💨
	\| 4 \| q8_0 \| 9.45±.3 \|🐖🐖🐖🐖💨💨💨
	\| 5 \| q3_k \| 9.50±.3 \|🐖🐖🐖🐖💨💨💨
	\| 6 \| q5_0 \| 10.48±.5\|🐖🐖🐖🐖🐖💨💨
	\| 7 \| iq4_nl \| 10.55±.5\|🐖🐖🐖🐖🐖💨💨
	\| 8 \| q5_1 \| 10.65±.5\|🐖🐖🐖🐖🐖💨💨
	\| 9 \| iq4_xs \| 11.45±.7\|🐖🐖🐖🐖🐖🐖💨
	\| 10\| iq3_s \| 11.62±.9\|🐢🐢🐢🐢🐢🐢💨
	\| 11\| iq3_xxs\| 12.08±.9\|🐢🐢🐢🐢🐢🐢🐢

	not all included in the initial test (*tested with a beginner laptop gpu only, if you have highend model, might find q8_0 running surprisingly faster than others), the rest of them, test it yourself; btw, the interesting thing is: the loading time required was not aligning with file size, due to the complexity of each calculation (dequant), and might vary from model

	### reference
	- base model from [black-forest-labs](https://huggingface.co/black-forest-labs)
	- comfyui from [comfyanonymous](https://github.com/comfyanonymous/ComfyUI)
	- gguf-connector ([pypi](https://pypi.org/project/gguf-connector))
	- gguf-node ([pypi](https://pypi.org/project/gguf-node)\|[repo](https://github.com/calcuis/gguf)\|[pack](https://github.com/calcuis/gguf/releases))