Zuellni
/

Z-Image-LoRAs

lora

Model card Files Files and versions

xet

Community

Zuellni commited on about 23 hours ago

Commit

7d4a399

verified ·

1 Parent(s): 755ca22

Update README.md

Browse files

Files changed (1) hide show

README.md +15 -9

README.md CHANGED Viewed

@@ -31,43 +31,49 @@ widget:
   text: oil painting
 ---
 <Gallery />
-<h3>Notes</h3>
 Various experiments, some decent, some bad. Trained on publicly accessible images with AI Toolkit on default settings with unquantized model and encoder. The best checkpoint should be the last one but mileage may vary. Trigger words are not necessary but can be used to make the effect stronger.
 <hr>
-<h4>Digital Art</h4>
 26 x 768px images, mostly close-ups, 4000 steps. I forgot to change the bucket size to 768px and trained them on 1024px, no idea if that affected anything. Booru-style captions but joined and without numerals, e.g., "girl" instead of "1girl", "long blonde hair" instead of "long hair, blonde hair".
 <br>
 <b>Surprisingly good</b> at all resolutions despite the small size but the fingers can get somewhat strange, the originals have some rather creative gestures and poses.
 <img src="images/digital_art_wide.png">
 <hr>
-<h4>Digital Illustration</h4>
 25 x 1024px images, mostly close-ups, 1024px buckets, 4000 steps. Booru-style captions as base but rewritten to be more natural, e.g., "a girl with long, blonde hair" instead of "1girl, long hair, blonde hair".
 <br>
 <b>Pretty good</b> but the originals tend to gravitate towards a certain size in the chest area and it can be difficult to change that aspect.
 <img src="images/digital_illustration_wide.png">
 <hr>
-<h4>Digital Painting</h4>
 v1 - 144 x 2048px images, 1024px and 1536px buckets, 4000 steps. Simple captions generated with <a href="https://huggingface.co/gokaygokay/Florence-2-Flux-Large">Florence-2</a>.
 <br>
 <b>Pretty bad</b> - details look sharp but the style is inconsistent.
-<br>
-<br>
 v2 - 72 x 2048px images, 1024px buckets, 7000 steps. Very detailed captions generated with <a href="https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated">Qwen3-VL-32B</a>.
 <br>
 <b>Decent</b> - details look slightly worse than v1 but the style is much more consistent. I'd probably use even fewer images for v3 but train with 1536px buckets and simpler captions.
 <img src="images/digital_painting_wide.png">
 <hr>
-<h4>Game Art</h4>
 270 x 1024px images extracted from a certain cute and funny game, 1024px buckets, 5000 steps, very detailed Qwen3-VL-32B captions.
 <br>
 <b>Very bad</b> - way too many images, looks like it averaged the style instead of learning it. Definitely fewer images and simpler captions for v2.
 <hr>
-<h4>Ink Art</h4>
 82 x 2048px images, 1024px and 1536px buckets, 4000 steps, very detailed Qwen3-VL-32B captions.
 <br>
 <b>Bad</b> - very inconsistent at applying the style, seems to randomly work better at higher resolutions. Same recommendations as above.
 <hr>
-<h4>Oil Painting</h4>
 60 x 2048px images, 1024px and 1536px buckets, 4000 steps, simple Florence-2 captions.
 <br>
 <b>Good</b> - no real complaints. I tried training a v2 with more detailed captions but it didn't change much. The style seems very easy for the model to grasp.

   text: oil painting
 ---
 <Gallery />
+### Notes
 Various experiments, some decent, some bad. Trained on publicly accessible images with AI Toolkit on default settings with unquantized model and encoder. The best checkpoint should be the last one but mileage may vary. Trigger words are not necessary but can be used to make the effect stronger.
 <hr>
+#### Digital Art
 26 x 768px images, mostly close-ups, 4000 steps. I forgot to change the bucket size to 768px and trained them on 1024px, no idea if that affected anything. Booru-style captions but joined and without numerals, e.g., "girl" instead of "1girl", "long blonde hair" instead of "long hair, blonde hair".
 <br>
 <b>Surprisingly good</b> at all resolutions despite the small size but the fingers can get somewhat strange, the originals have some rather creative gestures and poses.
 <img src="images/digital_art_wide.png">
 <hr>
+#### Digital Illustration
 25 x 1024px images, mostly close-ups, 1024px buckets, 4000 steps. Booru-style captions as base but rewritten to be more natural, e.g., "a girl with long, blonde hair" instead of "1girl, long hair, blonde hair".
 <br>
 <b>Pretty good</b> but the originals tend to gravitate towards a certain size in the chest area and it can be difficult to change that aspect.
 <img src="images/digital_illustration_wide.png">
 <hr>
+#### Digital Painting
 v1 - 144 x 2048px images, 1024px and 1536px buckets, 4000 steps. Simple captions generated with <a href="https://huggingface.co/gokaygokay/Florence-2-Flux-Large">Florence-2</a>.
 <br>
 <b>Pretty bad</b> - details look sharp but the style is inconsistent.
 v2 - 72 x 2048px images, 1024px buckets, 7000 steps. Very detailed captions generated with <a href="https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated">Qwen3-VL-32B</a>.
 <br>
 <b>Decent</b> - details look slightly worse than v1 but the style is much more consistent. I'd probably use even fewer images for v3 but train with 1536px buckets and simpler captions.
 <img src="images/digital_painting_wide.png">
 <hr>
+#### Game Art
 270 x 1024px images extracted from a certain cute and funny game, 1024px buckets, 5000 steps, very detailed Qwen3-VL-32B captions.
 <br>
 <b>Very bad</b> - way too many images, looks like it averaged the style instead of learning it. Definitely fewer images and simpler captions for v2.
 <hr>
+#### Ink Art
 82 x 2048px images, 1024px and 1536px buckets, 4000 steps, very detailed Qwen3-VL-32B captions.
 <br>
 <b>Bad</b> - very inconsistent at applying the style, seems to randomly work better at higher resolutions. Same recommendations as above.
 <hr>
+#### Oil Painting
 60 x 2048px images, 1024px and 1536px buckets, 4000 steps, simple Florence-2 captions.
 <br>
 <b>Good</b> - no real complaints. I tried training a v2 with more detailed captions but it didn't change much. The style seems very easy for the model to grasp.