LPX55 commited on
Commit
a5723a0
·
1 Parent(s): 56e596b

Add Gradio interface for multi-model diffusion and text generation tasks, including model loading/unloading functionality and shared state management. Introduce new tabs for text and diffusion models, enhancing user interaction and modularity.

Browse files
app_mm.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ import gc
4
+ import json
5
+ from pipeline_tabs.text_tab import text_tab
6
+ from pipeline_tabs.diffusion_tab import diffusion_tab
7
+
8
+ model_cache = {}
9
+
10
+ def unload_all_models():
11
+ model_cache.clear()
12
+ gc.collect()
13
+ if torch.cuda.is_available():
14
+ torch.cuda.empty_cache()
15
+
16
+ with gr.Blocks() as demo:
17
+ with gr.Tabs():
18
+ text_tab(model_cache, unload_all_models)
19
+ diffusion_tab(model_cache, unload_all_models)
20
+ # Shared state display
21
+ def pretty_json():
22
+ return json.dumps(list(model_cache.keys()), indent=2, ensure_ascii=False)
23
+ state_box = gr.Textbox(label="Loaded Models", lines=4, interactive=False, value=pretty_json())
24
+ # Update state_box whenever a model is loaded/unloaded
25
+ demo.load(fn=pretty_json, inputs=None, outputs=state_box)
26
+
27
+ # Optionally, you can add a button to refresh the state display
28
+ refresh_btn = gr.Button("Refresh Model State")
29
+ refresh_btn.click(fn=pretty_json, inputs=None, outputs=state_box)
30
+
31
+ demo.launch()
auto-diffuser.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert in optimizing diffusers library code for different hardware configurations.
2
+
3
+ NOTE: This system includes curated optimization knowledge from HuggingFace documentation.
4
+
5
+ TASK: Generate optimized Python code for running a diffusion model with the following specifications:
6
+ - Model: LPX55/FLUX.1-merged_lightning_v2
7
+ - Prompt: "A cat holding a sign that says hello world"
8
+ - Image size: 768x1152
9
+ - Inference steps: 8
10
+
11
+ HARDWARE SPECIFICATIONS:
12
+ - Platform: Linux (manual_input)
13
+ - CPU Cores: 8
14
+ - CUDA Available: False
15
+ - MPS Available: False
16
+ - Optimization Profile: balanced
17
+ - GPU: Custom GPU (20.0 GB VRAM)
18
+
19
+ OPTIMIZATION KNOWLEDGE BASE:
20
+
21
+ # DIFFUSERS OPTIMIZATION TECHNIQUES
22
+
23
+ ## Memory Optimization Techniques
24
+
25
+ ### 1. Model CPU Offloading
26
+ Use `enable_model_cpu_offload()` to move models between GPU and CPU automatically:
27
+ ```python
28
+ pipe.enable_model_cpu_offload()
29
+ ```
30
+ - Saves significant VRAM by keeping only active models on GPU
31
+ - Automatic management, no manual intervention needed
32
+ - Compatible with all pipelines
33
+
34
+ ### 2. Sequential CPU Offloading
35
+ Use `enable_sequential_cpu_offload()` for more aggressive memory saving:
36
+ ```python
37
+ pipe.enable_sequential_cpu_offload()
38
+ ```
39
+ - More memory efficient than model offloading
40
+ - Moves models to CPU after each forward pass
41
+ - Best for very limited VRAM scenarios
42
+
43
+ ### 3. Attention Slicing
44
+ Use `enable_attention_slicing()` to reduce memory during attention computation:
45
+ ```python
46
+ pipe.enable_attention_slicing()
47
+ # or specify slice size
48
+ pipe.enable_attention_slicing("max") # maximum slicing
49
+ pipe.enable_attention_slicing(1) # slice_size = 1
50
+ ```
51
+ - Trades compute time for memory
52
+ - Most effective for high-resolution images
53
+ - Can be combined with other techniques
54
+
55
+ ### 4. VAE Slicing
56
+ Use `enable_vae_slicing()` for large batch processing:
57
+ ```python
58
+ pipe.enable_vae_slicing()
59
+ ```
60
+ - Decodes images one at a time instead of all at once
61
+ - Essential for batch sizes > 4
62
+ - Minimal performance impact on single images
63
+
64
+ ### 5. VAE Tiling
65
+ Use `enable_vae_tiling()` for high-resolution image generation:
66
+ ```python
67
+ pipe.enable_vae_tiling()
68
+ ```
69
+ - Enables 4K+ image generation on 8GB VRAM
70
+ - Splits images into overlapping tiles
71
+ - Automatically disabled for 512x512 or smaller images
72
+
73
+ ### 6. Memory Efficient Attention (xFormers)
74
+ Use `enable_xformers_memory_efficient_attention()` if xFormers is installed:
75
+ ```python
76
+ pipe.enable_xformers_memory_efficient_attention()
77
+ ```
78
+ - Significantly reduces memory usage and improves speed
79
+ - Requires xformers library installation
80
+ - Compatible with most models
81
+
82
+ ## Performance Optimization Techniques
83
+
84
+ ### 1. Half Precision (FP16/BF16)
85
+ Use lower precision for better memory and speed:
86
+ ```python
87
+ # FP16 (widely supported)
88
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
89
+
90
+ # BF16 (better numerical stability, newer hardware)
91
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
92
+ ```
93
+ - FP16: Halves memory usage, widely supported
94
+ - BF16: Better numerical stability, requires newer GPUs
95
+ - Essential for most optimization scenarios
96
+
97
+ ### 2. Torch Compile (PyTorch 2.0+)
98
+ Use `torch.compile()` for significant speed improvements:
99
+ ```python
100
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
101
+ # For some models, compile VAE too:
102
+ pipe.vae.decode = torch.compile(pipe.vae.decode, mode="reduce-overhead", fullgraph=True)
103
+ ```
104
+ - 5-50% speed improvement
105
+ - Requires PyTorch 2.0+
106
+ - First run is slower due to compilation
107
+
108
+ ### 3. Fast Schedulers
109
+ Use faster schedulers for fewer steps:
110
+ ```python
111
+ from diffusers import LMSDiscreteScheduler, UniPCMultistepScheduler
112
+
113
+ # LMS Scheduler (good quality, fast)
114
+ pipe.scheduler = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
115
+
116
+ # UniPC Scheduler (fastest)
117
+ pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
118
+ ```
119
+
120
+ ## Hardware-Specific Optimizations
121
+
122
+ ### NVIDIA GPU Optimizations
123
+ ```python
124
+ # Enable Tensor Cores
125
+ torch.backends.cudnn.benchmark = True
126
+
127
+ # Optimal data type for NVIDIA
128
+ torch_dtype = torch.float16 # or torch.bfloat16 for RTX 30/40 series
129
+ ```
130
+
131
+ ### Apple Silicon (MPS) Optimizations
132
+ ```python
133
+ # Use MPS device
134
+ device = "mps" if torch.backends.mps.is_available() else "cpu"
135
+ pipe = pipe.to(device)
136
+
137
+ # Recommended dtype for Apple Silicon
138
+ torch_dtype = torch.bfloat16 # Better than float16 on Apple Silicon
139
+
140
+ # Attention slicing often helps on MPS
141
+ pipe.enable_attention_slicing()
142
+ ```
143
+
144
+ ### CPU Optimizations
145
+ ```python
146
+ # Use float32 for CPU
147
+ torch_dtype = torch.float32
148
+
149
+ # Enable optimized attention
150
+ pipe.enable_attention_slicing()
151
+ ```
152
+
153
+ ## Model-Specific Guidelines
154
+
155
+ ### FLUX Models
156
+ - Do NOT use guidance_scale parameter (not needed for FLUX)
157
+ - Use 4-8 inference steps maximum
158
+ - BF16 dtype recommended
159
+ - Enable attention slicing for memory optimization
160
+
161
+ ### Stable Diffusion XL
162
+ - Enable attention slicing for high resolutions
163
+ - Use refiner model sparingly to save memory
164
+ - Consider VAE tiling for >1024px images
165
+
166
+ ### Stable Diffusion 1.5/2.1
167
+ - Very memory efficient base models
168
+ - Can often run without optimizations on 8GB+ VRAM
169
+ - Enable VAE slicing for batch processing
170
+
171
+ ## Memory Usage Estimation
172
+ - FLUX.1: ~24GB for full precision, ~12GB for FP16
173
+ - SDXL: ~7GB for FP16, ~14GB for FP32
174
+ - SD 1.5: ~2GB for FP16, ~4GB for FP32
175
+
176
+ ## Optimization Combinations by VRAM
177
+
178
+ ### 24GB+ VRAM (High-end)
179
+ ```python
180
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
181
+ pipe = pipe.to("cuda")
182
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
183
+ ```
184
+
185
+ ### 12-24GB VRAM (Mid-range)
186
+ ```python
187
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
188
+ pipe = pipe.to("cuda")
189
+ pipe.enable_model_cpu_offload()
190
+ pipe.enable_xformers_memory_efficient_attention()
191
+ ```
192
+
193
+ ### 8-12GB VRAM (Entry-level)
194
+ ```python
195
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
196
+ pipe.enable_sequential_cpu_offload()
197
+ pipe.enable_attention_slicing()
198
+ pipe.enable_vae_slicing()
199
+ pipe.enable_xformers_memory_efficient_attention()
200
+ ```
201
+
202
+ ### <8GB VRAM (Low-end)
203
+ ```python
204
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
205
+ pipe.enable_sequential_cpu_offload()
206
+ pipe.enable_attention_slicing("max")
207
+ pipe.enable_vae_slicing()
208
+ pipe.enable_vae_tiling()
209
+ ```
210
+
211
+
212
+ IMPORTANT: For FLUX.1-schnell models, do NOT include guidance_scale parameter as it's not needed.
213
+
214
+ Using the OPTIMIZATION KNOWLEDGE BASE above, generate Python code that:
215
+
216
+ 1. **Selects the best optimization techniques** for the specific hardware profile
217
+ 2. **Applies appropriate memory optimizations** based on available VRAM
218
+ 3. **Uses optimal data types** for the target hardware:
219
+ - User specified dtype (if provided): Use exactly as specified
220
+ - Apple Silicon (MPS): prefer torch.bfloat16
221
+ - NVIDIA GPUs: prefer torch.float16 or torch.bfloat16
222
+ - CPU only: use torch.float32
223
+ 4. **Implements hardware-specific optimizations** (CUDA, MPS, CPU)
224
+ 5. **Follows model-specific guidelines** (e.g., FLUX guidance_scale handling)
225
+
226
+ IMPORTANT GUIDELINES:
227
+ - Reference the OPTIMIZATION KNOWLEDGE BASE to select appropriate techniques
228
+ - Include all necessary imports
229
+ - Add brief comments explaining optimization choices
230
+ - Generate compact, production-ready code
231
+ - Inline values where possible for concise code
232
+ - Generate ONLY the Python code, no explanations before or after the code block
pipeline_tabs/app_diffusion.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from diffusers import DiffusionPipeline
4
+ import gc
5
+
6
+ # Shared state for model cache
7
+ model_cache = {}
8
+
9
+ def load_flux_model():
10
+ model_id = "LPX55/FLUX.1-merged_lightning_v2"
11
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
12
+ pipe = pipe.to("cpu")
13
+ pipe.enable_attention_slicing()
14
+ return pipe
15
+
16
+ def unload_flux_model():
17
+ if "flux" in model_cache:
18
+ del model_cache["flux"]
19
+ gc.collect()
20
+ if torch.cuda.is_available():
21
+ torch.cuda.empty_cache()
22
+
23
+ def run_flux(prompt, width, height, steps):
24
+ if "flux" not in model_cache:
25
+ return None, "Model not loaded!"
26
+ pipe = model_cache["flux"]
27
+ image = pipe(
28
+ prompt=prompt,
29
+ width=width,
30
+ height=height,
31
+ num_inference_steps=steps,
32
+ ).images[0]
33
+ return image, "Success!"
34
+
35
+ with gr.Blocks() as demo:
36
+ with gr.Tab("FLUX Diffusion"):
37
+ status = gr.Markdown("Model not loaded.")
38
+ load_btn = gr.Button("Load Model")
39
+ unload_btn = gr.Button("Unload Model")
40
+ prompt = gr.Textbox(label="Prompt", value="A cat holding a sign that says hello world")
41
+ width = gr.Slider(256, 1536, value=768, step=64, label="Width")
42
+ height = gr.Slider(256, 1536, value=1152, step=64, label="Height")
43
+ steps = gr.Slider(1, 50, value=8, step=1, label="Inference Steps")
44
+ run_btn = gr.Button("Generate Image")
45
+ output_img = gr.Image(label="Output Image")
46
+ output_msg = gr.Textbox(label="Status", interactive=False)
47
+
48
+ def do_load():
49
+ model_cache["flux"] = load_flux_model()
50
+ return "Model loaded!"
51
+
52
+ def do_unload():
53
+ unload_flux_model()
54
+ return "Model unloaded!"
55
+
56
+ load_btn.click(do_load, None, status)
57
+ unload_btn.click(do_unload, None, status)
58
+ run_btn.click(run_flux, [prompt, width, height, steps], [output_img, output_msg])
59
+
60
+ demo.launch()
pipeline_tabs/app_task.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from transformers import pipeline
4
+ import gc
5
+ import json
6
+
7
+ # Define available models/tasks
8
+ MODEL_CONFIGS = [
9
+ {
10
+ "name": "Text Generation (GPT-2)",
11
+ "task": "text-generation",
12
+ "model": "gpt2",
13
+ "input_type": "text",
14
+ "output_type": "text"
15
+ },
16
+ {
17
+ "name": "Image Classification (ViT)",
18
+ "task": "image-classification",
19
+ "model": "google/vit-base-patch16-224",
20
+ "input_type": "image",
21
+ "output_type": "label"
22
+ },
23
+ # Add more models/tasks as needed
24
+ ]
25
+
26
+ # Shared state for demo
27
+ shared_state = gr.State({"active_model": None, "last_result": None})
28
+
29
+ # Model cache for lazy loading
30
+ model_cache = {}
31
+
32
+ def load_model(task, model_name):
33
+ # Use device_map="auto" or device=0 for GPU if available
34
+ return pipeline(task, model=model_name, device=-1)
35
+
36
+ def unload_model(model_key):
37
+ if model_key in model_cache:
38
+ del model_cache[model_key]
39
+ gc.collect()
40
+ if torch.cuda.is_available():
41
+ torch.cuda.empty_cache()
42
+
43
+ with gr.Blocks() as demo:
44
+ gr.Markdown("# Multi-Model, Multi-Task Gradio Demo\n_Switch between models and tasks in one Space!_")
45
+ tab_names = [m["name"] for m in MODEL_CONFIGS]
46
+ with gr.Tabs() as tabs:
47
+ tab_blocks = []
48
+ for i, config in enumerate(MODEL_CONFIGS):
49
+ with gr.Tab(config["name"]):
50
+ status = gr.Markdown(f"**Model:** {config['model']}<br>**Task:** {config['task']}")
51
+ load_btn = gr.Button("Load Model")
52
+ unload_btn = gr.Button("Unload Model")
53
+ if config["input_type"] == "text":
54
+ input_comp = gr.Textbox(label="Input Text")
55
+ elif config["input_type"] == "image":
56
+ input_comp = gr.Image(label="Input Image")
57
+ else:
58
+ input_comp = gr.Textbox(label="Input")
59
+ run_btn = gr.Button("Run Model")
60
+ output_comp = gr.Textbox(label="Output", lines=4)
61
+ model_key = f"{config['task']}|{config['model']}"
62
+
63
+ def do_load(state):
64
+ if model_key not in model_cache:
65
+ model_cache[model_key] = load_model(config["task"], config["model"])
66
+ state = dict(state)
67
+ state["active_model"] = model_key
68
+ return f"Loaded: {model_key}", state
69
+
70
+ def do_unload(state):
71
+ unload_model(model_key)
72
+ state = dict(state)
73
+ state["active_model"] = None
74
+ return f"Unloaded: {model_key}", state
75
+
76
+ def do_run(inp, state):
77
+ if model_key not in model_cache:
78
+ return "Model not loaded!", state
79
+ pipe = model_cache[model_key]
80
+ result = pipe(inp)
81
+ state = dict(state)
82
+ state["last_result"] = result
83
+ return str(result), state
84
+
85
+ load_btn.click(do_load, shared_state, [status, shared_state])
86
+ unload_btn.click(do_unload, shared_state, [status, shared_state])
87
+ run_btn.click(do_run, [input_comp, shared_state], [output_comp, shared_state])
88
+
89
+ # Shared state display
90
+ def pretty_json(state):
91
+ return json.dumps(state, indent=2, ensure_ascii=False)
92
+ shared_state_box = gr.Textbox(label="Shared State", lines=8, interactive=False)
93
+ shared_state.change(pretty_json, shared_state, shared_state_box)
94
+
95
+ demo.launch()
pipeline_tabs/diffusion_tab.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from diffusers import DiffusionPipeline
4
+ import gc
5
+
6
+ def diffusion_tab(model_cache, unload_all_models):
7
+ def load_diffusion_model():
8
+ unload_all_models()
9
+ model_id = "LPX55/FLUX.1-merged_lightning_v2"
10
+ pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
11
+ pipe = pipe.to("cpu")
12
+ pipe.enable_attention_slicing()
13
+ model_cache["diffusion"] = pipe
14
+ return "Diffusion model loaded!"
15
+
16
+ def unload_diffusion_model():
17
+ if "diffusion" in model_cache:
18
+ del model_cache["diffusion"]
19
+ gc.collect()
20
+ if torch.cuda.is_available():
21
+ torch.cuda.empty_cache()
22
+ return "Diffusion model unloaded!"
23
+
24
+ def run_diffusion(prompt, width, height, steps):
25
+ if "diffusion" not in model_cache:
26
+ return None, "Diffusion model not loaded!"
27
+ pipe = model_cache["diffusion"]
28
+ image = pipe(
29
+ prompt=prompt,
30
+ width=width,
31
+ height=height,
32
+ num_inference_steps=steps,
33
+ ).images[0]
34
+ return image, "Success!"
35
+
36
+ with gr.Tab("Diffusion"):
37
+ status = gr.Markdown("Model not loaded.")
38
+ load_btn = gr.Button("Load Diffusion Model")
39
+ unload_btn = gr.Button("Unload Model")
40
+ prompt = gr.Textbox(label="Prompt", value="A cat holding a sign that says hello world")
41
+ width = gr.Slider(256, 1536, value=768, step=64, label="Width")
42
+ height = gr.Slider(256, 1536, value=1152, step=64, label="Height")
43
+ steps = gr.Slider(1, 50, value=8, step=1, label="Inference Steps")
44
+ run_btn = gr.Button("Generate Image")
45
+ output_img = gr.Image(label="Output Image")
46
+ output_msg = gr.Textbox(label="Status", interactive=False)
47
+ load_btn.click(load_diffusion_model, None, status)
48
+ unload_btn.click(unload_diffusion_model, None, status)
49
+ run_btn.click(run_diffusion, [prompt, width, height, steps], [output_img, output_msg])
pipeline_tabs/text_tab.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import pipeline
3
+
4
+ def text_tab(model_cache, unload_all_models):
5
+ def load_text_model():
6
+ unload_all_models()
7
+ model_cache["text"] = pipeline("text-generation", model="gpt2", device=-1)
8
+ return "Text model loaded!"
9
+
10
+ def unload_text_model():
11
+ if "text" in model_cache:
12
+ del model_cache["text"]
13
+ return "Text model unloaded!"
14
+
15
+ def run_text(prompt):
16
+ if "text" not in model_cache:
17
+ return "Text model not loaded!"
18
+ return model_cache["text"](prompt)[0]["generated_text"]
19
+
20
+ with gr.Tab("Text Generation"):
21
+ status = gr.Markdown("Model not loaded.")
22
+ load_btn = gr.Button("Load Text Model")
23
+ unload_btn = gr.Button("Unload Model")
24
+ prompt = gr.Textbox(label="Prompt", value="Hello world")
25
+ run_btn = gr.Button("Generate")
26
+ output = gr.Textbox(label="Output")
27
+ load_btn.click(load_text_model, None, status)
28
+ unload_btn.click(unload_text_model, None, status)
29
+ run_btn.click(run_text, prompt, output)
requirements.txt CHANGED
@@ -1,3 +1,6 @@
1
  gradio[mcp]
2
  numpy
3
- pandas
 
 
 
 
1
  gradio[mcp]
2
  numpy
3
+ pandas
4
+ torch
5
+ transformers
6
+ diffusers