Sketch β Semi-Realism Digital Art LoRA
LoRA for FireRed-Image-Edit-1.1 (Qwen-Image DiT architecture) that converts line sketches into semi-realism digital art in a specific stylistic register.
Style trigger
Use t01nspcstyle in prompts. Example:
turn this sketch into t01nspcstyle digital art, <scene description>
Base model
- Transformer / DiT:
FireRedTeam/FireRed-Image-Edit-1.1 - Text encoder / VAE:
Qwen/Qwen-Image - Processor:
Qwen/Qwen-Image-Edit(processor/) - Trainer: DiffSynth-Studio (v2.0.9)
Dataset
257 pairs of sketch β digital art. Each caption has the form
turn this sketch into t01nspcstyle digital art, <scene description>.
Data columns:
| Column | Meaning |
|---|---|
image |
Target (digital art) |
edit_image |
Input (sketch) |
prompt |
Instruction |
Training parameters
| Parameter | Value | Notes |
|---|---|---|
lora_rank |
32 | |
lora_base_model |
dit |
LoRA adapts the transformer only |
lora_target_modules |
q/k/v proj, mlp.net.2, mod.1 | (full list in training_config.json) |
learning_rate |
1e-4 | AdamW default betas |
num_epochs |
4 | |
dataset_repeat |
25 | β 6 425 steps per epoch |
| Total steps | ~25 700 | |
max_pixels |
1 478 656 | 1216Β² β divisible by VAE stride |
mixed_precision |
bf16 | |
use_gradient_checkpointing |
true | required on 80 GB VRAM |
save_steps |
2000 | intermediate checkpoints |
Hardware
- GPU: NVIDIA H100 80GB HBM3 (SXM)
- Throughput: ~3.6 s/it
- Per-epoch time: ~6h 25m
- Peak VRAM: ~67 GB
- Peak power: ~670 W
Checkpoints
Intermediate checkpoints are written every 2000 steps. Use the earliest one that looks good β later steps risk style overfit (257 images is small for rank 32).
Inference (DiffSynth-Studio)
from PIL import Image
from diffsynth.pipelines.qwen_image import QwenImagePipeline, ModelConfig
pipe = QwenImagePipeline.from_pretrained(
torch_dtype="bfloat16",
device="cuda",
model_configs=[
ModelConfig(model_id="FireRedTeam/FireRed-Image-Edit-1.1",
origin_file_pattern="transformer/diffusion_pytorch_model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image",
origin_file_pattern="text_encoder/model*.safetensors"),
ModelConfig(model_id="Qwen/Qwen-Image",
origin_file_pattern="vae/diffusion_pytorch_model.safetensors"),
],
tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"),
processor_config=ModelConfig(model_id="Qwen/Qwen-Image-Edit", origin_file_pattern="processor/"),
)
pipe.load_lora(pipe.dit, "step-XXXX.safetensors", alpha=1.0)
sketch = Image.open("my_sketch.png")
image = pipe(
prompt="turn this sketch into t01nspcstyle digital art, <scene>",
edit_image=[sketch],
num_inference_steps=30,
cfg_scale=4.0,
)
image.save("out.png")
IMPORTANT: FireRed expects edit_image as a list, even with a single
image (edit_image=[img]), not edit_image=img.
License
Apache-2.0, inherited from both FireRed-Image-Edit-1.1 and Qwen-Image.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for Zaytron40k/firered-sketch2art-lora
Base model
FireRedTeam/FireRed-Image-Edit-1.1