Gui28F
/

BeamDiffusion

@@ -1,102 +1,105 @@
----
-language:
-  - en
-base_model:
-  - stabilityai/stable-diffusion-2-1
-tags:
-  - Beam-Search
-  - stable-diffusion
-  - Diffusers
-  - Latent-Space
-pipeline_tag: text-to-image
----
-# BeamDiffusion: Latent Beam Diffusion Models for Decoding Image Sequences
-**BeamDiffusion** introduces a novel approach for generating coherent image sequences from text prompts by employing beam search in latent space. Unlike traditional methods that generate images independently, BeamDiffusion iteratively explores latent representations, ensuring smooth transitions and visual continuity across frames. A cross-attention mechanism efficiently scores and prunes search paths, optimizing both textual alignment and visual coherence.
-BeamDiffusion addresses the challenge of maintaining visual consistency in image sequences generated from text prompts. By leveraging a beam search strategy in the latent space, it refines the generation process to produce sequences with enhanced coherence and alignment with textual descriptions, as outlined in the [paper](https://arxiv.org/abs/2503.20429).
----
-## 🛠️ Setup Instructions
-Before using BeamDiffusion, follow these steps to set up your environment:
-```bash
-# 1. Create a virtual environment (recommended)
-python3 -m venv beam_env
-# 2. Activate the virtual environment
-source beam_env/bin/activate  # On macOS/Linux
-# beam_env\Scripts\activate    # On Windows
-pip install huggingface_hub
-huggingface-cli download Gui28F/BeamDiffusion2 --include requirements.txt  --local-dir .
-# 3. Install required dependencies
-pip install -r ./requirements.txt
-```
----
-## 🚀 Quickstart Guide
-Here's a basic example of how to use BeamDiffusion with the `transformers` library to generate an image sequence based on a series of text prompts:
-```python
-from huggingface_hub import snapshot_download
-# Download the model snapshot
-snapshot_download(repo_id="Gui28F/BeamDiffusion", local_dir="BeamDiffusionModel")
-from BeamDiffusionModel.beam_diffusion import BeamDiffusionPipeline, BeamDiffusionConfig, BeamDiffusionModel
-# Initialize the configuration, model, and pipeline
-config = BeamDiffusionConfig(sd="SD-2.1", latents_idx=[0, 1, 2, 3], n_seeds=4, steps_back=2, beam_width=2, window_size=2, use_rand=True)
-model = BeamDiffusionModel(config)
-pipe = BeamDiffusionPipeline(model)
-# Define the input parameters
-input_data = {
-    "steps": ["A lively outdoor celebration with guests gathered around, everyone excited to support the event.",
-              "A chef in a cooking uniform raises one hand dramatically, signaling it's time to serve the food.",
-              "Guests chat and laugh in a vibrant setting, with people gathered around tables, enjoying the event."],
-}
-# Generate the sequence of images
-sequence_imgs = pipe(input_data)
-```
-**Result:**
-![Generated Image Sequence](./example.png)
-## 🔍 Input Parameters Explained
-- **`sd`** (`str`): The base model to use for image generation. The available options are `SD-2.1` and `flux`.
-- **`steps`** (`list of strings`): Descriptions for each step in the image generation process. The model generates one image per step, forming a sequence that aligns with these descriptions.
-- **`latents_idx`** (`list of integers`): Indices referring to specific positions in the latent space to be used during image generation. This allows the model to leverage different latent representations for diverse outputs.
-- **`n_seeds`** (`int`): Number of random seeds to use for the generation process. Each seed provides a different starting point for the randomness in the first step, influencing the diversity of generated sequences.
-- **`seeds`** (`list of integers`): Specific seeds to use for the generation process. If provided, these seeds override the `n_seeds` parameter, allowing for controlled randomness.
-- **`steps_back`** (`int`): Number of previous steps to consider during the beam search process. This parameter helps refine the current generation by incorporating information from earlier steps.
-- **`beam_width`** (`int`): Number of candidate sequences to maintain during inference. Beam search evaluates multiple potential outputs and keeps the most probable ones based on the defined criteria.
-- **`window_size`** (`int`): Size of the "window" for beam search pruning. Determines after how many steps pruning starts, helping the model focus on more probable options as the generation progresses.
-- **`use_rand`** (`bool`): Flag to introduce randomness in the inference process. If set to `True`, the model generates more varied and creative results; if `False`, it produces more deterministic outputs.
-## 📚 Citation
-If you use BeamDiffusion in your research or projects, please cite the following paper:
-```
-@misc{fernandes2025latentbeamdiffusionmodels,
-      title={Latent Beam Diffusion Models for Generating Visual Sequences},
-      author={Guilherme Fernandes and Vasco Ramos and Regev Cohen and Idan Szpektor and João Magalhães},
-      year={2025},
-      eprint={2503.20429},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV},
-      url={https://arxiv.org/abs/2503.20429},
-}
-```

+---
+language:
+- en
+base_model:
+- stabilityai/stable-diffusion-2-1
+- black-forest-labs/FLUX.1-dev
+tags:
+- Beam-Search
+- stable-diffusion
+- Diffusers
+- Latent-Space
+- DiT
+- Flux
+pipeline_tag: text-to-image
+---
+# BeamDiffusion: Latent Beam Diffusion Models for Decoding Image Sequences
+**BeamDiffusion** introduces a novel approach for generating coherent image sequences from text prompts by employing beam search in latent space. Unlike traditional methods that generate images independently, BeamDiffusion iteratively explores latent representations, ensuring smooth transitions and visual continuity across frames. A cross-attention mechanism efficiently scores and prunes search paths, optimizing both textual alignment and visual coherence.
+BeamDiffusion addresses the challenge of maintaining visual consistency in image sequences generated from text prompts. By leveraging a beam search strategy in the latent space, it refines the generation process to produce sequences with enhanced coherence and alignment with textual descriptions, as outlined in the [paper](https://arxiv.org/abs/2503.20429).
+---
+## 🛠️ Setup Instructions
+Before using BeamDiffusion, follow these steps to set up your environment:
+```bash
+# 1. Create a virtual environment (recommended)
+python3 -m venv beam_env
+# 2. Activate the virtual environment
+source beam_env/bin/activate  # On macOS/Linux
+# beam_env\Scripts\activate    # On Windows
+pip install huggingface_hub
+huggingface-cli download Gui28F/BeamDiffusion2 --include requirements.txt  --local-dir .
+# 3. Install required dependencies
+pip install -r ./requirements.txt
+```
+---
+## 🚀 Quickstart Guide
+Here's a basic example of how to use BeamDiffusion with the `transformers` library to generate an image sequence based on a series of text prompts:
+```python
+from huggingface_hub import snapshot_download
+# Download the model snapshot
+snapshot_download(repo_id="Gui28F/BeamDiffusion", local_dir="BeamDiffusionModel")
+from BeamDiffusionModel.beam_diffusion import BeamDiffusionPipeline, BeamDiffusionConfig, BeamDiffusionModel
+# Initialize the configuration, model, and pipeline
+config = BeamDiffusionConfig(sd="SD-2.1", latents_idx=[0, 1, 2, 3], n_seeds=4, steps_back=2, beam_width=2, window_size=2, use_rand=True)
+model = BeamDiffusionModel(config)
+pipe = BeamDiffusionPipeline(model)
+# Define the input parameters
+input_data = {
+    "steps": ["A lively outdoor celebration with guests gathered around, everyone excited to support the event.",
+              "A chef in a cooking uniform raises one hand dramatically, signaling it's time to serve the food.",
+              "Guests chat and laugh in a vibrant setting, with people gathered around tables, enjoying the event."],
+}
+# Generate the sequence of images
+sequence_imgs = pipe(input_data)
+```
+**Result:**
+![Generated Image Sequence](./example.png)
+## 🔍 Input Parameters Explained
+- **`sd`** (`str`): The base model to use for image generation. The available options are `SD-2.1` and `flux`.
+- **`steps`** (`list of strings`): Descriptions for each step in the image generation process. The model generates one image per step, forming a sequence that aligns with these descriptions.
+- **`latents_idx`** (`list of integers`): Indices referring to specific positions in the latent space to be used during image generation. This allows the model to leverage different latent representations for diverse outputs.
+- **`n_seeds`** (`int`): Number of random seeds to use for the generation process. Each seed provides a different starting point for the randomness in the first step, influencing the diversity of generated sequences.
+- **`seeds`** (`list of integers`): Specific seeds to use for the generation process. If provided, these seeds override the `n_seeds` parameter, allowing for controlled randomness.
+- **`steps_back`** (`int`): Number of previous steps to consider during the beam search process. This parameter helps refine the current generation by incorporating information from earlier steps.
+- **`beam_width`** (`int`): Number of candidate sequences to maintain during inference. Beam search evaluates multiple potential outputs and keeps the most probable ones based on the defined criteria.
+- **`window_size`** (`int`): Size of the "window" for beam search pruning. Determines after how many steps pruning starts, helping the model focus on more probable options as the generation progresses.
+- **`use_rand`** (`bool`): Flag to introduce randomness in the inference process. If set to `True`, the model generates more varied and creative results; if `False`, it produces more deterministic outputs.
+## 📚 Citation
+If you use BeamDiffusion in your research or projects, please cite the following paper:
+```
+@misc{fernandes2025latentbeamdiffusionmodels,
+      title={Latent Beam Diffusion Models for Generating Visual Sequences},
+      author={Guilherme Fernandes and Vasco Ramos and Regev Cohen and Idan Szpektor and João Magalhães},
+      year={2025},
+      eprint={2503.20429},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2503.20429},
+}
+```