srishticrai commited on
Commit
9fb21e5
·
verified ·
1 Parent(s): 17d40f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - CompVis/stable-diffusion-v1-4
7
+ pipeline_tag: text-to-image
8
+ library_name: diffusers
9
+ tags:
10
+ - stable-diffusion
11
+ - fine-tuned
12
+ - unet
13
+ - flickr8k
14
+ - generative-ai
15
+ - text-to-image
16
+ ---
17
+
18
+ # 🖼️ Fine-Tuned U-Net (Flickr8k) — Stable Diffusion
19
+
20
+ This model contains a **fine-tuned U-Net** from the `CompVis/stable-diffusion-v1-4` Stable Diffusion pipeline, trained using natural English captions from the **Flickr8k** dataset. It enhances generation quality for everyday, human-centered scenarios like actions, objects, and environmental scenes.
21
+
22
+ > ✅ Only the U-Net was fine-tuned. The VAE, tokenizer, and text encoder remain from the original base model.
23
+
24
+ ---
25
+
26
+ ## 📊 Training Details
27
+
28
+ - **Base model**: [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)
29
+ - **Fine-tuned on**: [Flickr8k Kaggle Dataset](https://www.kaggle.com/datasets/adityajn105/flickr8k)
30
+ - **Components fine-tuned**: `unet` only
31
+ - **Frozen**: text encoder, VAE, and tokenizer
32
+ - **Epochs**: 10
33
+ - **Learning rate**: 1e-6
34
+ - **Batch size**: 1 (with gradient accumulation = 16 → effective batch size ≈ 16)
35
+ - **Image resolution**: 256×256
36
+ - **Training size**: 1000 image-caption pairs
37
+ - **Mixed precision**: FP16
38
+ - **Gradient Accumulation Steps**: 16
39
+ - **Trained on**: Kaggle GPU (Tesla T4, 16GB VRAM)
40
+ - **Seed**: 42
41
+ - **Checkpointing**: every 200 steps
42
+
43
+ ---
44
+
45
+ ## 🧠 Usage
46
+
47
+ This U-Net can be loaded into a standard Stable Diffusion pipeline to enhance image generation on descriptive prompts:
48
+
49
+ ```python
50
+ from diffusers import StableDiffusionPipeline, UNet2DConditionModel
51
+ import torch
52
+
53
+ # Load fine-tuned U-Net
54
+ unet = UNet2DConditionModel.from_pretrained("srishticrai/unet-flickr8k")
55
+
56
+ # Load pipeline with original components + fine-tuned U-Net
57
+ pipe = StableDiffusionPipeline.from_pretrained(
58
+ "CompVis/stable-diffusion-v1-4",
59
+ unet=unet,
60
+ torch_dtype=torch.float16
61
+ ).to("cuda")
62
+
63
+ # Generate image
64
+ image = pipe("A child blowing bubbles in a park at sunset").images[0]
65
+ image.show()