dynamic-hfspaces / README.md
LPX55's picture
Update README.md
2b91516 verified

A newer version of the Gradio SDK is available: 5.42.0

Upgrade
metadata
title: Dynamic Tab Loading Examples
emoji: 🏢
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: true
license: apache-2.0
short_description: Exploring different loading methods for a HF Space

Dynamic Space Loading


1. Sending Data To/From IFrames

A. Standard Web (HTML/JS) Context

  • IFrames are sandboxed: By default, an iframe is isolated from the parent page for security reasons.
  • postMessage API:
    • The standard way to communicate between a parent page and an iframe (and vice versa) is using the window.postMessage API.
    • This requires both the parent and the iframe to have JavaScript code that listens for and sends messages.
    • Example:
      • Parent: iframeEl.contentWindow.postMessage({data: "hello"}, "https://iframe-domain.com")
      • IFrame: window.parent.postMessage({data: "hi back"}, "https://parent-domain.com")
  • Limitations in Gradio:
    • Gradio does not expose a built-in way to inject custom JS for postMessage into the iframe or parent.
    • If you control both the parent and the iframe (i.e., both are your own apps), you could add custom JS to both and use postMessage.
    • If the iframe is a third-party app (like a Hugging Face Space you don’t control), you cannot inject JS into it, so you cannot send/receive data programmatically.

B. Gradio Context

  • No built-in Gradio API for iframe communication.
  • You can use gr.HTML to inject a script into the parent, but you cannot inject into the iframe if you don’t control its code.

2. Sending Data Between Tabs in Gradio

  • Tabs in Gradio are just layout elements: All components in all tabs exist in the same Python process and can share state.
  • You can use gr.State or any shared variable:
    • For example, you can have a gr.State object that is updated in one tab and read in another.
    • You can also use hidden components or callbacks to pass data between tabs.

3. Summary Table

Method Parent ↔ IFrame Tab ↔ Tab (Gradio)
postMessage (JS) Yes (if you control both) N/A
gr.State No Yes
Hidden Components No Yes
gradio API No Yes

4. Practical Recommendations

  • For arbitrary Hugging Face Spaces in iframes:
    • You cannot send/receive data programmatically unless the Space itself is designed to listen for postMessage.
  • For your own Spaces:
    • You can add JS to both parent and iframe to use postMessage.
  • For Gradio tabs:
    • Use gr.State or shared components for seamless data transfer.

This is a very insightful and advanced question! Here’s a breakdown of what’s possible, what’s not, and what’s practical with Gradio, Hugging Face Spaces, and Python environments:


2. GPU Spaces (transformers/diffusers) Loading/Unloading

A. In a Single Python Process (One Space, One App)

  • You can load multiple models/pipelines in one Gradio app.
    • You can have a dropdown or tabs to select which model/task/pipeline to use.
    • You can load/unload models on demand (though loading large models is slow).
    • You can keep all models in memory (if you have enough GPU RAM), or load/unload as needed.
  • You cannot have truly separate environments (e.g., different Python dependencies, CUDA versions, or isolated memory) in a single Space.
    • All code runs in the same Python process/environment.
    • All models share the same GPU/CPU memory pool.

Example:

from transformers import pipeline
import gradio as gr

# Preload or lazy-load multiple pipelines
pipe1 = pipeline("text-generation", model="gpt2")
pipe2 = pipeline("image-classification", model="google/vit-base-patch16-224")

def run_model(input, model_choice):
    if model_choice == "Text Generation":
        return pipe1(input)
    elif model_choice == "Image Classification":
        return pipe2(input)
    # ... more models

gr.Interface(
    fn=run_model,
    inputs=[gr.Textbox(), gr.Dropdown(["Text Generation", "Image Classification"])],
    outputs="auto"
).launch()
  • You can use tabs or dropdowns to switch between models/tasks.

B. Multiple Gradio Apps in One Space

  • You can define multiple Gradio interfaces in one script and show/hide them with tabs or dropdowns.
  • But: They still share the same Python process and memory.

C. True Isolation (Multiple Environments)

  • Not possible in a single Hugging Face Space.
    • You cannot have multiple Python environments, different dependency sets, or isolated GPU memory pools in one Space.
    • Each Space is a single container/process.

D. What About Docker or Subprocesses?

  • Hugging Face Spaces (hosted) do not support running multiple containers or true subprocess isolation with different environments.
  • On your own infrastructure, you could use Docker or subprocesses, but this is not supported on Spaces.

3. Best Practices for Multi-Model/Multi-Task Apps

  • Lazy-load models: Only load a model when its tab is selected, and unload it when switching (if memory is a concern).
  • Use a single environment: Install all dependencies needed for all models in your requirements.txt.
  • Warn users about memory: If users switch between large models, GPU memory may fill up and require manual cleanup (e.g., torch.cuda.empty_cache()).

4. Summary Table

Approach Isolation Multiple Models Multiple Envs GPU Sharing Supported on Spaces
Single Gradio app, many models No Yes No Yes Yes
Multiple Gradio apps in one file No Yes No Yes Yes
Multiple Spaces (one per app) Yes Yes Yes Isolated Yes
Docker/subprocess isolation Yes Yes Yes Isolated No (on Spaces)

4. What’s Practical?

  • For most use cases:
    • Use a single app with tabs/dropdowns to select the model/task.
    • Lazy-load and unload models as needed to manage memory.
  • For true isolation:
    • Use multiple Spaces (one per app/model) or host your own infrastructure with Docker.

5. Properly Unloading Models, Weights, and Freeing Memory in PyTorch/Diffusers

When working with large models (especially on GPU), it's important to:

  • Delete references to the model and pipeline
  • Call gc.collect() to trigger Python's garbage collector
  • Call torch.cuda.empty_cache() (if using CUDA) to free GPU memory

Best Practice Pattern

Here’s a robust pattern for loading and unloading models in a multi-model Gradio app:

import torch
import gc
from diffusers import DiffusionPipeline

model_cache = {}

def load_diffusion_model(model_id, dtype=torch.float32, device="cpu"):
    pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=dtype)
    pipe = pipe.to(device)
    pipe.enable_attention_slicing()
    return pipe

def unload_model(model_key):
    # Remove from cache
    if model_key in model_cache:
        del model_cache[model_key]
    # Run Python garbage collection
    gc.collect()
    # Free GPU memory if using CUDA
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

How to Use in a Gradio Tab

import gradio as gr

model_id = "LPX55/FLUX.1-merged_lightning_v2"
model_key = "flux"
device = "cpu"  # or "cuda" if available and desired

def do_load():
    if model_key not in model_cache:
        model_cache[model_key] = load_diffusion_model(model_id, torch.float32, device)
    return "Model loaded!"

def do_unload():
    unload_model(model_key)
    return "Model unloaded!"

def run_inference(prompt, width, height, steps):
    if model_key not in model_cache:
        return None, "Model not loaded!"
    pipe = model_cache[model_key]
    image = pipe(
        prompt=prompt,
        width=width,
        height=height,
        num_inference_steps=steps,
    ).images[0]
    return image, "Success!"

with gr.Blocks() as demo:
    status = gr.Markdown("Model not loaded.")
    load_btn = gr.Button("Load Model")
    unload_btn = gr.Button("Unload Model")
    prompt = gr.Textbox(label="Prompt", value="A cat holding a sign that says hello world")
    width = gr.Slider(256, 1536, value=768, step=64, label="Width")
    height = gr.Slider(256, 1536, value=1152, step=64, label="Height")
    steps = gr.Slider(1, 50, value=8, step=1, label="Inference Steps")
    run_btn = gr.Button("Generate Image")
    output_img = gr.Image(label="Output Image")
    output_msg = gr.Textbox(label="Status", interactive=False)

    load_btn.click(do_load, None, status)
    unload_btn.click(do_unload, None, status)
    run_btn.click(run_inference, [prompt, width, height, steps], [output_img, output_msg])

demo.launch()

Key Points

  • Always delete the model from your cache/dictionary.
  • Call gc.collect() after deleting the model.
  • Call torch.cuda.empty_cache() if using CUDA.
  • Do this every time you switch models or want to free memory.

Advanced: Unloading All Models

If you want to ensure all models are unloaded (e.g., when switching tabs):

def unload_all_models():
    model_cache.clear()
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()

Summary Table

Step CPU GPU (CUDA)
Delete model object
gc.collect()
torch.cuda.empty_cache()