Update README.md
Browse files
README.md
CHANGED
|
@@ -31,43 +31,49 @@ widget:
|
|
| 31 |
text: oil painting
|
| 32 |
---
|
| 33 |
<Gallery />
|
| 34 |
-
|
|
|
|
| 35 |
Various experiments, some decent, some bad. Trained on publicly accessible images with AI Toolkit on default settings with unquantized model and encoder. The best checkpoint should be the last one but mileage may vary. Trigger words are not necessary but can be used to make the effect stronger.
|
| 36 |
<hr>
|
| 37 |
-
|
|
|
|
| 38 |
26 x 768px images, mostly close-ups, 4000 steps. I forgot to change the bucket size to 768px and trained them on 1024px, no idea if that affected anything. Booru-style captions but joined and without numerals, e.g., "girl" instead of "1girl", "long blonde hair" instead of "long hair, blonde hair".
|
| 39 |
<br>
|
| 40 |
<b>Surprisingly good</b> at all resolutions despite the small size but the fingers can get somewhat strange, the originals have some rather creative gestures and poses.
|
| 41 |
<img src="images/digital_art_wide.png">
|
| 42 |
<hr>
|
| 43 |
-
|
|
|
|
| 44 |
25 x 1024px images, mostly close-ups, 1024px buckets, 4000 steps. Booru-style captions as base but rewritten to be more natural, e.g., "a girl with long, blonde hair" instead of "1girl, long hair, blonde hair".
|
| 45 |
<br>
|
| 46 |
<b>Pretty good</b> but the originals tend to gravitate towards a certain size in the chest area and it can be difficult to change that aspect.
|
| 47 |
<img src="images/digital_illustration_wide.png">
|
| 48 |
<hr>
|
| 49 |
-
|
|
|
|
| 50 |
v1 - 144 x 2048px images, 1024px and 1536px buckets, 4000 steps. Simple captions generated with <a href="https://huggingface.co/gokaygokay/Florence-2-Flux-Large">Florence-2</a>.
|
| 51 |
<br>
|
| 52 |
<b>Pretty bad</b> - details look sharp but the style is inconsistent.
|
| 53 |
-
|
| 54 |
-
<br>
|
| 55 |
v2 - 72 x 2048px images, 1024px buckets, 7000 steps. Very detailed captions generated with <a href="https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated">Qwen3-VL-32B</a>.
|
| 56 |
<br>
|
| 57 |
<b>Decent</b> - details look slightly worse than v1 but the style is much more consistent. I'd probably use even fewer images for v3 but train with 1536px buckets and simpler captions.
|
| 58 |
<img src="images/digital_painting_wide.png">
|
| 59 |
<hr>
|
| 60 |
-
|
|
|
|
| 61 |
270 x 1024px images extracted from a certain cute and funny game, 1024px buckets, 5000 steps, very detailed Qwen3-VL-32B captions.
|
| 62 |
<br>
|
| 63 |
<b>Very bad</b> - way too many images, looks like it averaged the style instead of learning it. Definitely fewer images and simpler captions for v2.
|
| 64 |
<hr>
|
| 65 |
-
|
|
|
|
| 66 |
82 x 2048px images, 1024px and 1536px buckets, 4000 steps, very detailed Qwen3-VL-32B captions.
|
| 67 |
<br>
|
| 68 |
<b>Bad</b> - very inconsistent at applying the style, seems to randomly work better at higher resolutions. Same recommendations as above.
|
| 69 |
<hr>
|
| 70 |
-
|
|
|
|
| 71 |
60 x 2048px images, 1024px and 1536px buckets, 4000 steps, simple Florence-2 captions.
|
| 72 |
<br>
|
| 73 |
<b>Good</b> - no real complaints. I tried training a v2 with more detailed captions but it didn't change much. The style seems very easy for the model to grasp.
|
|
|
|
| 31 |
text: oil painting
|
| 32 |
---
|
| 33 |
<Gallery />
|
| 34 |
+
|
| 35 |
+
### Notes
|
| 36 |
Various experiments, some decent, some bad. Trained on publicly accessible images with AI Toolkit on default settings with unquantized model and encoder. The best checkpoint should be the last one but mileage may vary. Trigger words are not necessary but can be used to make the effect stronger.
|
| 37 |
<hr>
|
| 38 |
+
|
| 39 |
+
#### Digital Art
|
| 40 |
26 x 768px images, mostly close-ups, 4000 steps. I forgot to change the bucket size to 768px and trained them on 1024px, no idea if that affected anything. Booru-style captions but joined and without numerals, e.g., "girl" instead of "1girl", "long blonde hair" instead of "long hair, blonde hair".
|
| 41 |
<br>
|
| 42 |
<b>Surprisingly good</b> at all resolutions despite the small size but the fingers can get somewhat strange, the originals have some rather creative gestures and poses.
|
| 43 |
<img src="images/digital_art_wide.png">
|
| 44 |
<hr>
|
| 45 |
+
|
| 46 |
+
#### Digital Illustration
|
| 47 |
25 x 1024px images, mostly close-ups, 1024px buckets, 4000 steps. Booru-style captions as base but rewritten to be more natural, e.g., "a girl with long, blonde hair" instead of "1girl, long hair, blonde hair".
|
| 48 |
<br>
|
| 49 |
<b>Pretty good</b> but the originals tend to gravitate towards a certain size in the chest area and it can be difficult to change that aspect.
|
| 50 |
<img src="images/digital_illustration_wide.png">
|
| 51 |
<hr>
|
| 52 |
+
|
| 53 |
+
#### Digital Painting
|
| 54 |
v1 - 144 x 2048px images, 1024px and 1536px buckets, 4000 steps. Simple captions generated with <a href="https://huggingface.co/gokaygokay/Florence-2-Flux-Large">Florence-2</a>.
|
| 55 |
<br>
|
| 56 |
<b>Pretty bad</b> - details look sharp but the style is inconsistent.
|
| 57 |
+
|
|
|
|
| 58 |
v2 - 72 x 2048px images, 1024px buckets, 7000 steps. Very detailed captions generated with <a href="https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-32B-Instruct-abliterated">Qwen3-VL-32B</a>.
|
| 59 |
<br>
|
| 60 |
<b>Decent</b> - details look slightly worse than v1 but the style is much more consistent. I'd probably use even fewer images for v3 but train with 1536px buckets and simpler captions.
|
| 61 |
<img src="images/digital_painting_wide.png">
|
| 62 |
<hr>
|
| 63 |
+
|
| 64 |
+
#### Game Art
|
| 65 |
270 x 1024px images extracted from a certain cute and funny game, 1024px buckets, 5000 steps, very detailed Qwen3-VL-32B captions.
|
| 66 |
<br>
|
| 67 |
<b>Very bad</b> - way too many images, looks like it averaged the style instead of learning it. Definitely fewer images and simpler captions for v2.
|
| 68 |
<hr>
|
| 69 |
+
|
| 70 |
+
#### Ink Art
|
| 71 |
82 x 2048px images, 1024px and 1536px buckets, 4000 steps, very detailed Qwen3-VL-32B captions.
|
| 72 |
<br>
|
| 73 |
<b>Bad</b> - very inconsistent at applying the style, seems to randomly work better at higher resolutions. Same recommendations as above.
|
| 74 |
<hr>
|
| 75 |
+
|
| 76 |
+
#### Oil Painting
|
| 77 |
60 x 2048px images, 1024px and 1536px buckets, 4000 steps, simple Florence-2 captions.
|
| 78 |
<br>
|
| 79 |
<b>Good</b> - no real complaints. I tried training a v2 with more detailed captions but it didn't change much. The style seems very easy for the model to grasp.
|