peteromallet's picture
Update README.md
fe9cea6 verified
|
raw
history blame
4.33 kB
metadata
license: apache-2.0
datasets:
  - peteromallet/high-quality-midjouney-srefs
base_model:
  - Qwen/Qwen-Image-Edit
tags:
  - image
  - editing
  - lora
  - scene-generation
  - qwen
pipeline_tag: image-to-image
library_name: diffusers

QwenEdit InScene LoRAs (Beta)

Model Description

InScene and InScene Annotate are a pair of LoRA fine-tunes for QwenEdit that enhance its ability to generate images based on scene references. These models work together to provide flexible scene-based image generation with optional annotation support.

InScene

The main model that generates images based on scene composition and layout from a reference image. InScene is trained on pairs of different shots within the same scene, along with prompts describing the desired output. Its goal is to create entirely new shots within a scene while maintaining character consistency and scene coherence.

InScene is intentionally biased towards creating completely new shots rather than minor edits. This design choice overcomes Qwen-Image-Edit's internal bias toward making small, conservative edits, enabling more dramatic scene transformations while preserving the characters and overall scene identity.

inscene-samples.png

InScene Annotate

InScene Annotate is trained on images with green rectangles drawn over specific regions. The model learns to generate images showing the subject within that green rectangle area. Rather than simply zooming in precisely on the marked region, it's trained to flexibly interpret instructions to show what's inside that area - capturing the subject, context, and framing in a more natural, composed way rather than a strict crop.

inscene-annotate-samples.png

InScene and InScene Annotate are currently in beta.

How to Use

InScene

To use the base InScene model, start your prompt with:

Make an image in this scene of

And then describe what you want to generate.

For example: Make an image in this scene of a bustling city street at night.

InScene Annotate

For the annotate variant, use annotated reference images and start your prompt with:

Based on this annotated scene, create

For example: Based on this annotated scene, create a winter landscape with snow-covered mountains.

Use with diffusers

InScene:

import torch
from diffusers import QwenImageEditPipeline

pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")

pipe.load_lora_weights("peteromallet/Qwen-Image-Edit-InScene", weight_name="InScene-0.7.safetensors")

InScene Annotate:

import torch
from diffusers import QwenImageEditPipeline

pipe = QwenImageEditPipeline.from_pretrained("Qwen/Qwen-Image-Edit", torch_dtype=torch.bfloat16)
pipe.to("cuda")

pipe.load_lora_weights("peteromallet/Qwen-Image-Edit-InScene", weight_name="InScene-Annotate-0.7.safetensors")

Strengths & Weaknesses

The models excel at:

  • Capturing scene composition and spatial layout from reference images
  • Maintaining consistent scene structure while varying content
  • Understanding spatial relationships between elements
  • Strong prompt adherence with scene-aware generation
  • (Annotate) Precise control using annotated references

The models may struggle with:

  • Very complex multi-layered scenes with numerous elements
  • Extremely abstract or non-traditional scene compositions
  • Fine-grained details that conflict with the reference scene layout
  • Occasional depth perception issues

Training Data

The InScene and InScene Annotate LoRAs were trained on a curated dataset of high-quality Midjourney style references, with additional scene-focused annotations for the Annotate variant. The dataset emphasizes diverse scene compositions and spatial relationships.

You can find the public dataset used for training here: https://huggingface.co/datasets/peteromallet/high-quality-midjouney-srefs

Links