Prompt
Asuka Langley sitting cross-legged on a large, high-backed armchair. She is wearing a glossy, reflective red plugsuit. The plush armchair is upholstered in dark fabric. She is in an elegantly decorated room with a warm, glowing fireplace.

Various experiments, some decent, some bad. Trained on publicly accessible images with AI Toolkit on default settings with unquantized model and encoder. The best checkpoint should be the last one but mileage may vary. Trigger words are not necessary but can be used to make the effect stronger.

digital art
26 * 768px images, mostly close-ups. I forgot to change the bucket size to 768px and trained them on 1024px, no idea if that affected anything. Booru-style captions but joined and without numerals, e.g., "girl" instead of "1girl", "long blonde hair" instead of "long hair, blonde hair".
Surprisingly good at all resolutions despite the small size but the fingers can get somewhat strange, the originals had some rather creative gestures and poses.

digital illustration
25 * 1024px images, mostly close-ups, 1024px buckets. Booru-style captions as base but rewritten to be more natural, e.g., "a girl with long, blonde hair" instead of "1girl, long hair, blonde hair".
Pretty good but the originals tended to gravitate towards a certain size in the chest area and it can be difficult to change that aspect.

digital painting
144 * 2048px images, 1024px and 1536px buckets. Short captions generated with some old LLM.
Pretty bad - details look sharp but the style is inconsistent.

digital painting v2
72 * 2048px images, 1024px buckets. Very long captions generated with Qwen3-VL-32B-Instruct.
Decent - details look slightly worse than v1 but the style is much more consistent. I'd probably use even fewer images for v3 but train with 1536px buckets and shorter captions.

game art
270 * 1024px images extracted from a certain cute and funny game, 1024px buckets, very long Qwen3-VL-32B captions.
Very bad - way too many images, looks like it averaged the style instead of learning it. Definitely fewer images for v2 and maybe shorter captions.

ink art
82 * 2048px images, 1024px and 1536px buckets, very long Qwen3-VL-32B captions.
Bad - very inconsistent at applying the style, seems to work better at higher resolutions. Same recommendations as above.

oil painting
60 * 2048px images, 1024px and 1536px buckets. Short captions generated with some old LLM.
Good - no real complaints. I tried training a v2 with longer captions but it didn't change much. The style seems very easy for the model to grasp.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zuellni/Z-Image-LoRAs

Base model

Tongyi-MAI/Z-Image-Turbo

Adapter

(57)

this model

Zuellni
/

Z-Image-LoRAs

Model tree for Zuellni/Z-Image-LoRAs

Space using Zuellni/Z-Image-LoRAs 1