dynamic-hfspaces

Runtime error

App Files Files Community

LPX55 commited on Jun 24

Commit

a5723a0

1 Parent(s): 56e596b

Add Gradio interface for multi-model diffusion and text generation tasks, including model loading/unloading functionality and shared state management. Introduce new tabs for text and diffusion models, enhancing user interaction and modularity.

Browse files

Files changed (7) hide show

app_mm.py +31 -0
auto-diffuser.md +232 -0
pipeline_tabs/app_diffusion.py +60 -0
pipeline_tabs/app_task.py +95 -0
pipeline_tabs/diffusion_tab.py +49 -0
pipeline_tabs/text_tab.py +29 -0
requirements.txt +4 -1

app_mm.py ADDED Viewed

	@@ -0,0 +1,31 @@

+import gradio as gr
+import torch
+import gc
+import json
+from pipeline_tabs.text_tab import text_tab
+from pipeline_tabs.diffusion_tab import diffusion_tab
+model_cache = {}
+def unload_all_models():
+    model_cache.clear()
+    gc.collect()
+    if torch.cuda.is_available():
+        torch.cuda.empty_cache()
+with gr.Blocks() as demo:
+    with gr.Tabs():
+        text_tab(model_cache, unload_all_models)
+        diffusion_tab(model_cache, unload_all_models)
+    # Shared state display
+    def pretty_json():
+        return json.dumps(list(model_cache.keys()), indent=2, ensure_ascii=False)
+    state_box = gr.Textbox(label="Loaded Models", lines=4, interactive=False, value=pretty_json())
+    # Update state_box whenever a model is loaded/unloaded
+    demo.load(fn=pretty_json, inputs=None, outputs=state_box)
+    # Optionally, you can add a button to refresh the state display
+    refresh_btn = gr.Button("Refresh Model State")
+    refresh_btn.click(fn=pretty_json, inputs=None, outputs=state_box)
+demo.launch()

auto-diffuser.md ADDED Viewed

	@@ -0,0 +1,232 @@

+You are an expert in optimizing diffusers library code for different hardware configurations.
+NOTE: This system includes curated optimization knowledge from HuggingFace documentation.
+TASK: Generate optimized Python code for running a diffusion model with the following specifications:
+- Model: LPX55/FLUX.1-merged_lightning_v2
+- Prompt: "A cat holding a sign that says hello world"
+- Image size: 768x1152
+- Inference steps: 8
+HARDWARE SPECIFICATIONS:
+- Platform: Linux (manual_input)
+- CPU Cores: 8
+- CUDA Available: False
+- MPS Available: False
+- Optimization Profile: balanced
+- GPU: Custom GPU (20.0 GB VRAM)
+OPTIMIZATION KNOWLEDGE BASE:
+# DIFFUSERS OPTIMIZATION TECHNIQUES
+## Memory Optimization Techniques
+### 1. Model CPU Offloading
+Use `enable_model_cpu_offload()` to move models between GPU and CPU automatically:
+```python
+pipe.enable_model_cpu_offload()
+```
+- Saves significant VRAM by keeping only active models on GPU
+- Automatic management, no manual intervention needed
+- Compatible with all pipelines
+### 2. Sequential CPU Offloading
+Use `enable_sequential_cpu_offload()` for more aggressive memory saving:
+```python
+pipe.enable_sequential_cpu_offload()
+```
+- More memory efficient than model offloading
+- Moves models to CPU after each forward pass
+- Best for very limited VRAM scenarios
+### 3. Attention Slicing
+Use `enable_attention_slicing()` to reduce memory during attention computation:
+```python
+pipe.enable_attention_slicing()
+# or specify slice size
+pipe.enable_attention_slicing("max")  # maximum slicing
+pipe.enable_attention_slicing(1)      # slice_size = 1
+```
+- Trades compute time for memory
+- Most effective for high-resolution images
+- Can be combined with other techniques
+### 4. VAE Slicing
+Use `enable_vae_slicing()` for large batch processing:
+```python
+pipe.enable_vae_slicing()
+```
+- Decodes images one at a time instead of all at once
+- Essential for batch sizes > 4
+- Minimal performance impact on single images
+### 5. VAE Tiling
+Use `enable_vae_tiling()` for high-resolution image generation:
+```python
+pipe.enable_vae_tiling()
+```
+- Enables 4K+ image generation on 8GB VRAM
+- Splits images into overlapping tiles
+- Automatically disabled for 512x512 or smaller images
+### 6. Memory Efficient Attention (xFormers)
+Use `enable_xformers_memory_efficient_attention()` if xFormers is installed:
+```python
+pipe.enable_xformers_memory_efficient_attention()
+```
+- Significantly reduces memory usage and improves speed
+- Requires xformers library installation
+- Compatible with most models
+## Performance Optimization Techniques
+### 1. Half Precision (FP16/BF16)
+Use lower precision for better memory and speed:
+```python
+# FP16 (widely supported)
+pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
+# BF16 (better numerical stability, newer hardware)
+pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
+```
+- FP16: Halves memory usage, widely supported
+- BF16: Better numerical stability, requires newer GPUs
+- Essential for most optimization scenarios
+### 2. Torch Compile (PyTorch 2.0+)
+Use `torch.compile()` for significant speed improvements:
+```python
+pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+# For some models, compile VAE too:
+pipe.vae.decode = torch.compile(pipe.vae.decode, mode="reduce-overhead", fullgraph=True)
+```
+- 5-50% speed improvement
+- Requires PyTorch 2.0+
+- First run is slower due to compilation
+### 3. Fast Schedulers
+Use faster schedulers for fewer steps:
+```python
+from diffusers import LMSDiscreteScheduler, UniPCMultistepScheduler
+# LMS Scheduler (good quality, fast)
+pipe.scheduler = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
+# UniPC Scheduler (fastest)
+pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
+```
+## Hardware-Specific Optimizations
+### NVIDIA GPU Optimizations
+```python
+# Enable Tensor Cores
+torch.backends.cudnn.benchmark = True
+# Optimal data type for NVIDIA
+torch_dtype = torch.float16  # or torch.bfloat16 for RTX 30/40 series
+```
+### Apple Silicon (MPS) Optimizations
+```python
+# Use MPS device
+device = "mps" if torch.backends.mps.is_available() else "cpu"
+pipe = pipe.to(device)
+# Recommended dtype for Apple Silicon
+torch_dtype = torch.bfloat16  # Better than float16 on Apple Silicon
+# Attention slicing often helps on MPS
+pipe.enable_attention_slicing()
+```
+### CPU Optimizations
+```python
+# Use float32 for CPU
+torch_dtype = torch.float32
+# Enable optimized attention
+pipe.enable_attention_slicing()
+```
+## Model-Specific Guidelines
+### FLUX Models
+- Do NOT use guidance_scale parameter (not needed for FLUX)
+- Use 4-8 inference steps maximum
+- BF16 dtype recommended
+- Enable attention slicing for memory optimization
+### Stable Diffusion XL
+- Enable attention slicing for high resolutions
+- Use refiner model sparingly to save memory
+- Consider VAE tiling for >1024px images
+### Stable Diffusion 1.5/2.1
+- Very memory efficient base models
+- Can often run without optimizations on 8GB+ VRAM
+- Enable VAE slicing for batch processing
+## Memory Usage Estimation
+- FLUX.1: ~24GB for full precision, ~12GB for FP16
+- SDXL: ~7GB for FP16, ~14GB for FP32
+- SD 1.5: ~2GB for FP16, ~4GB for FP32
+## Optimization Combinations by VRAM
+### 24GB+ VRAM (High-end)
+```python
+pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
+pipe = pipe.to("cuda")
+pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+```
+### 12-24GB VRAM (Mid-range)
+```python
+pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
+pipe = pipe.to("cuda")
+pipe.enable_model_cpu_offload()
+pipe.enable_xformers_memory_efficient_attention()
+```
+### 8-12GB VRAM (Entry-level)
+```python
+pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
+pipe.enable_sequential_cpu_offload()
+pipe.enable_attention_slicing()
+pipe.enable_vae_slicing()
+pipe.enable_xformers_memory_efficient_attention()
+```
+### <8GB VRAM (Low-end)
+```python
+pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
+pipe.enable_sequential_cpu_offload()
+pipe.enable_attention_slicing("max")
+pipe.enable_vae_slicing()
+pipe.enable_vae_tiling()
+```
+IMPORTANT: For FLUX.1-schnell models, do NOT include guidance_scale parameter as it's not needed.
+Using the OPTIMIZATION KNOWLEDGE BASE above, generate Python code that:
+1. **Selects the best optimization techniques** for the specific hardware profile
+2. **Applies appropriate memory optimizations** based on available VRAM
+3. **Uses optimal data types** for the target hardware:
+   - User specified dtype (if provided): Use exactly as specified
+   - Apple Silicon (MPS): prefer torch.bfloat16
+   - NVIDIA GPUs: prefer torch.float16 or torch.bfloat16
+   - CPU only: use torch.float32
+4. **Implements hardware-specific optimizations** (CUDA, MPS, CPU)
+5. **Follows model-specific guidelines** (e.g., FLUX guidance_scale handling)
+IMPORTANT GUIDELINES:
+- Reference the OPTIMIZATION KNOWLEDGE BASE to select appropriate techniques
+- Include all necessary imports
+- Add brief comments explaining optimization choices
+- Generate compact, production-ready code
+- Inline values where possible for concise code
+- Generate ONLY the Python code, no explanations before or after the code block

pipeline_tabs/app_diffusion.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import gradio as gr
+import torch
+from diffusers import DiffusionPipeline
+import gc
+# Shared state for model cache
+model_cache = {}
+def load_flux_model():
+    model_id = "LPX55/FLUX.1-merged_lightning_v2"
+    pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
+    pipe = pipe.to("cpu")
+    pipe.enable_attention_slicing()
+    return pipe
+def unload_flux_model():
+    if "flux" in model_cache:
+        del model_cache["flux"]
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+def run_flux(prompt, width, height, steps):
+    if "flux" not in model_cache:
+        return None, "Model not loaded!"
+    pipe = model_cache["flux"]
+    image = pipe(
+        prompt=prompt,
+        width=width,
+        height=height,
+        num_inference_steps=steps,
+    ).images[0]
+    return image, "Success!"
+with gr.Blocks() as demo:
+    with gr.Tab("FLUX Diffusion"):
+        status = gr.Markdown("Model not loaded.")
+        load_btn = gr.Button("Load Model")
+        unload_btn = gr.Button("Unload Model")
+        prompt = gr.Textbox(label="Prompt", value="A cat holding a sign that says hello world")
+        width = gr.Slider(256, 1536, value=768, step=64, label="Width")
+        height = gr.Slider(256, 1536, value=1152, step=64, label="Height")
+        steps = gr.Slider(1, 50, value=8, step=1, label="Inference Steps")
+        run_btn = gr.Button("Generate Image")
+        output_img = gr.Image(label="Output Image")
+        output_msg = gr.Textbox(label="Status", interactive=False)
+        def do_load():
+            model_cache["flux"] = load_flux_model()
+            return "Model loaded!"
+        def do_unload():
+            unload_flux_model()
+            return "Model unloaded!"
+        load_btn.click(do_load, None, status)
+        unload_btn.click(do_unload, None, status)
+        run_btn.click(run_flux, [prompt, width, height, steps], [output_img, output_msg])
+demo.launch()

pipeline_tabs/app_task.py ADDED Viewed

	@@ -0,0 +1,95 @@

+import gradio as gr
+import torch
+from transformers import pipeline
+import gc
+import json
+# Define available models/tasks
+MODEL_CONFIGS = [
+    {
+        "name": "Text Generation (GPT-2)",
+        "task": "text-generation",
+        "model": "gpt2",
+        "input_type": "text",
+        "output_type": "text"
+    },
+    {
+        "name": "Image Classification (ViT)",
+        "task": "image-classification",
+        "model": "google/vit-base-patch16-224",
+        "input_type": "image",
+        "output_type": "label"
+    },
+    # Add more models/tasks as needed
+]
+# Shared state for demo
+shared_state = gr.State({"active_model": None, "last_result": None})
+# Model cache for lazy loading
+model_cache = {}
+def load_model(task, model_name):
+    # Use device_map="auto" or device=0 for GPU if available
+    return pipeline(task, model=model_name, device=-1)
+def unload_model(model_key):
+    if model_key in model_cache:
+        del model_cache[model_key]
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+with gr.Blocks() as demo:
+    gr.Markdown("# Multi-Model, Multi-Task Gradio Demo\n_Switch between models and tasks in one Space!_")
+    tab_names = [m["name"] for m in MODEL_CONFIGS]
+    with gr.Tabs() as tabs:
+        tab_blocks = []
+        for i, config in enumerate(MODEL_CONFIGS):
+            with gr.Tab(config["name"]):
+                status = gr.Markdown(f"**Model:** {config['model']}<br>**Task:** {config['task']}")
+                load_btn = gr.Button("Load Model")
+                unload_btn = gr.Button("Unload Model")
+                if config["input_type"] == "text":
+                    input_comp = gr.Textbox(label="Input Text")
+                elif config["input_type"] == "image":
+                    input_comp = gr.Image(label="Input Image")
+                else:
+                    input_comp = gr.Textbox(label="Input")
+                run_btn = gr.Button("Run Model")
+                output_comp = gr.Textbox(label="Output", lines=4)
+                model_key = f"{config['task']}|{config['model']}"
+                def do_load(state):
+                    if model_key not in model_cache:
+                        model_cache[model_key] = load_model(config["task"], config["model"])
+                    state = dict(state)
+                    state["active_model"] = model_key
+                    return f"Loaded: {model_key}", state
+                def do_unload(state):
+                    unload_model(model_key)
+                    state = dict(state)
+                    state["active_model"] = None
+                    return f"Unloaded: {model_key}", state
+                def do_run(inp, state):
+                    if model_key not in model_cache:
+                        return "Model not loaded!", state
+                    pipe = model_cache[model_key]
+                    result = pipe(inp)
+                    state = dict(state)
+                    state["last_result"] = result
+                    return str(result), state
+                load_btn.click(do_load, shared_state, [status, shared_state])
+                unload_btn.click(do_unload, shared_state, [status, shared_state])
+                run_btn.click(do_run, [input_comp, shared_state], [output_comp, shared_state])
+    # Shared state display
+    def pretty_json(state):
+        return json.dumps(state, indent=2, ensure_ascii=False)
+    shared_state_box = gr.Textbox(label="Shared State", lines=8, interactive=False)
+    shared_state.change(pretty_json, shared_state, shared_state_box)
+demo.launch()

pipeline_tabs/diffusion_tab.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import gradio as gr
+import torch
+from diffusers import DiffusionPipeline
+import gc
+def diffusion_tab(model_cache, unload_all_models):
+    def load_diffusion_model():
+        unload_all_models()
+        model_id = "LPX55/FLUX.1-merged_lightning_v2"
+        pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32)
+        pipe = pipe.to("cpu")
+        pipe.enable_attention_slicing()
+        model_cache["diffusion"] = pipe
+        return "Diffusion model loaded!"
+    def unload_diffusion_model():
+        if "diffusion" in model_cache:
+            del model_cache["diffusion"]
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+        return "Diffusion model unloaded!"
+    def run_diffusion(prompt, width, height, steps):
+        if "diffusion" not in model_cache:
+            return None, "Diffusion model not loaded!"
+        pipe = model_cache["diffusion"]
+        image = pipe(
+            prompt=prompt,
+            width=width,
+            height=height,
+            num_inference_steps=steps,
+        ).images[0]
+        return image, "Success!"
+    with gr.Tab("Diffusion"):
+        status = gr.Markdown("Model not loaded.")
+        load_btn = gr.Button("Load Diffusion Model")
+        unload_btn = gr.Button("Unload Model")
+        prompt = gr.Textbox(label="Prompt", value="A cat holding a sign that says hello world")
+        width = gr.Slider(256, 1536, value=768, step=64, label="Width")
+        height = gr.Slider(256, 1536, value=1152, step=64, label="Height")
+        steps = gr.Slider(1, 50, value=8, step=1, label="Inference Steps")
+        run_btn = gr.Button("Generate Image")
+        output_img = gr.Image(label="Output Image")
+        output_msg = gr.Textbox(label="Status", interactive=False)
+        load_btn.click(load_diffusion_model, None, status)
+        unload_btn.click(unload_diffusion_model, None, status)
+        run_btn.click(run_diffusion, [prompt, width, height, steps], [output_img, output_msg])

pipeline_tabs/text_tab.py ADDED Viewed

	@@ -0,0 +1,29 @@

+import gradio as gr
+from transformers import pipeline
+def text_tab(model_cache, unload_all_models):
+    def load_text_model():
+        unload_all_models()
+        model_cache["text"] = pipeline("text-generation", model="gpt2", device=-1)
+        return "Text model loaded!"
+    def unload_text_model():
+        if "text" in model_cache:
+            del model_cache["text"]
+        return "Text model unloaded!"
+    def run_text(prompt):
+        if "text" not in model_cache:
+            return "Text model not loaded!"
+        return model_cache["text"](prompt)[0]["generated_text"]
+    with gr.Tab("Text Generation"):
+        status = gr.Markdown("Model not loaded.")
+        load_btn = gr.Button("Load Text Model")
+        unload_btn = gr.Button("Unload Model")
+        prompt = gr.Textbox(label="Prompt", value="Hello world")
+        run_btn = gr.Button("Generate")
+        output = gr.Textbox(label="Output")
+        load_btn.click(load_text_model, None, status)
+        unload_btn.click(unload_text_model, None, status)
+        run_btn.click(run_text, prompt, output)

requirements.txt CHANGED Viewed

@@ -1,3 +1,6 @@
 gradio[mcp]
 numpy
-pandas

 gradio[mcp]
 numpy
+pandas
+torch
+transformers
+diffusers