srishticrai
/

unet-flickr8k

stable-diffusion

Model card Files Files and versions

srishticrai commited on Apr 22

Commit

46a5af8

·

verified ·

1 Parent(s): 58c757a

Update README.md

Files changed (1) hide show

README.md +32 -7

README.md CHANGED Viewed

@@ -47,19 +47,44 @@ This model contains a **fine-tuned U-Net** from the `CompVis/stable-diffusion-v1
 This U-Net can be loaded into a standard Stable Diffusion pipeline to enhance image generation on descriptive prompts:
 ```python
-from diffusers import StableDiffusionPipeline, UNet2DConditionModel
 import torch
-# Load fine-tuned U-Net
-unet = UNet2DConditionModel.from_pretrained("srishticrai/unet-flickr8k")
-# Load pipeline with original components + fine-tuned U-Net
 pipe = StableDiffusionPipeline.from_pretrained(
     "CompVis/stable-diffusion-v1-4",
-    unet=unet,
     torch_dtype=torch.float16
-).to("cuda")
 # Generate image
-image = pipe("A child blowing bubbles in a park at sunset").images[0]
 image.show()

 This U-Net can be loaded into a standard Stable Diffusion pipeline to enhance image generation on descriptive prompts:
 ```python
+from diffusers import StableDiffusionPipeline, UNet2DConditionModel, AutoencoderKL, DDPMScheduler
+from transformers import CLIPTextModel
 import torch
+import matplotlib.pyplot as plt
+# Load base components
+print("Loading VAE and text encoder from base SD...")
+vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", torch_dtype=torch.float16).to(device)
+text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.float16).to(device)
+# Load fine-tuned UNet from Hugging Face
+print("Loading fine-tuned UNet from Hugging Face (srishticrai/unet-flickr8k)...")
+fine_tuned_unet = UNet2DConditionModel.from_pretrained(
+    "srishticrai/unet-flickr8k",
+    torch_dtype=torch.float16
+).to(device)
+# Rebuild the pipeline
 pipe = StableDiffusionPipeline.from_pretrained(
     "CompVis/stable-diffusion-v1-4",
+    unet=fine_tuned_unet,
+    vae=vae,
+    text_encoder=text_encoder,
     torch_dtype=torch.float16
+).to(device)
+pipe.set_progress_bar_config(disable=False)
+pipe.enable_attention_slicing()
+# Ask for prompt
+prompt = input("Enter a prompt to generate an image: ")
 # Generate image
+image = pipe(
+    prompt,
+    guidance_scale=10.0,
+    num_inference_steps=50
+)
 image.show()
+```