Gui28F commited on
Commit
9edb625
·
verified ·
1 Parent(s): e28b576

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -102
README.md CHANGED
@@ -1,102 +1,105 @@
1
- ---
2
- language:
3
- - en
4
- base_model:
5
- - stabilityai/stable-diffusion-2-1
6
- tags:
7
- - Beam-Search
8
- - stable-diffusion
9
- - Diffusers
10
- - Latent-Space
11
- pipeline_tag: text-to-image
12
- ---
13
- # BeamDiffusion: Latent Beam Diffusion Models for Decoding Image Sequences
14
- **BeamDiffusion** introduces a novel approach for generating coherent image sequences from text prompts by employing beam search in latent space. Unlike traditional methods that generate images independently, BeamDiffusion iteratively explores latent representations, ensuring smooth transitions and visual continuity across frames. A cross-attention mechanism efficiently scores and prunes search paths, optimizing both textual alignment and visual coherence.
15
- BeamDiffusion addresses the challenge of maintaining visual consistency in image sequences generated from text prompts. By leveraging a beam search strategy in the latent space, it refines the generation process to produce sequences with enhanced coherence and alignment with textual descriptions, as outlined in the [paper](https://arxiv.org/abs/2503.20429).
16
-
17
- ---
18
- ## 🛠️ Setup Instructions
19
-
20
- Before using BeamDiffusion, follow these steps to set up your environment:
21
-
22
- ```bash
23
- # 1. Create a virtual environment (recommended)
24
- python3 -m venv beam_env
25
-
26
- # 2. Activate the virtual environment
27
- source beam_env/bin/activate # On macOS/Linux
28
- # beam_env\Scripts\activate # On Windows
29
-
30
- pip install huggingface_hub
31
- huggingface-cli download Gui28F/BeamDiffusion2 --include requirements.txt --local-dir .
32
-
33
- # 3. Install required dependencies
34
- pip install -r ./requirements.txt
35
- ```
36
- ---
37
- ## 🚀 Quickstart Guide
38
-
39
- Here's a basic example of how to use BeamDiffusion with the `transformers` library to generate an image sequence based on a series of text prompts:
40
-
41
- ```python
42
- from huggingface_hub import snapshot_download
43
-
44
- # Download the model snapshot
45
- snapshot_download(repo_id="Gui28F/BeamDiffusion", local_dir="BeamDiffusionModel")
46
- from BeamDiffusionModel.beam_diffusion import BeamDiffusionPipeline, BeamDiffusionConfig, BeamDiffusionModel
47
-
48
- # Initialize the configuration, model, and pipeline
49
- config = BeamDiffusionConfig(sd="SD-2.1", latents_idx=[0, 1, 2, 3], n_seeds=4, steps_back=2, beam_width=2, window_size=2, use_rand=True)
50
- model = BeamDiffusionModel(config)
51
- pipe = BeamDiffusionPipeline(model)
52
-
53
- # Define the input parameters
54
- input_data = {
55
- "steps": ["A lively outdoor celebration with guests gathered around, everyone excited to support the event.",
56
- "A chef in a cooking uniform raises one hand dramatically, signaling it's time to serve the food.",
57
- "Guests chat and laugh in a vibrant setting, with people gathered around tables, enjoying the event."],
58
- }
59
-
60
- # Generate the sequence of images
61
- sequence_imgs = pipe(input_data)
62
- ```
63
-
64
- **Result:**
65
-
66
- ![Generated Image Sequence](./example.png)
67
-
68
- ## 🔍 Input Parameters Explained
69
-
70
- - **`sd`** (`str`): The base model to use for image generation. The available options are `SD-2.1` and `flux`.
71
-
72
- - **`steps`** (`list of strings`): Descriptions for each step in the image generation process. The model generates one image per step, forming a sequence that aligns with these descriptions.
73
-
74
- - **`latents_idx`** (`list of integers`): Indices referring to specific positions in the latent space to be used during image generation. This allows the model to leverage different latent representations for diverse outputs.
75
-
76
- - **`n_seeds`** (`int`): Number of random seeds to use for the generation process. Each seed provides a different starting point for the randomness in the first step, influencing the diversity of generated sequences.
77
-
78
- - **`seeds`** (`list of integers`): Specific seeds to use for the generation process. If provided, these seeds override the `n_seeds` parameter, allowing for controlled randomness.
79
-
80
- - **`steps_back`** (`int`): Number of previous steps to consider during the beam search process. This parameter helps refine the current generation by incorporating information from earlier steps.
81
-
82
- - **`beam_width`** (`int`): Number of candidate sequences to maintain during inference. Beam search evaluates multiple potential outputs and keeps the most probable ones based on the defined criteria.
83
-
84
- - **`window_size`** (`int`): Size of the "window" for beam search pruning. Determines after how many steps pruning starts, helping the model focus on more probable options as the generation progresses.
85
-
86
- - **`use_rand`** (`bool`): Flag to introduce randomness in the inference process. If set to `True`, the model generates more varied and creative results; if `False`, it produces more deterministic outputs.
87
-
88
- ## 📚 Citation
89
-
90
- If you use BeamDiffusion in your research or projects, please cite the following paper:
91
-
92
- ```
93
- @misc{fernandes2025latentbeamdiffusionmodels,
94
- title={Latent Beam Diffusion Models for Generating Visual Sequences},
95
- author={Guilherme Fernandes and Vasco Ramos and Regev Cohen and Idan Szpektor and João Magalhães},
96
- year={2025},
97
- eprint={2503.20429},
98
- archivePrefix={arXiv},
99
- primaryClass={cs.CV},
100
- url={https://arxiv.org/abs/2503.20429},
101
- }
102
- ```
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - stabilityai/stable-diffusion-2-1
6
+ - black-forest-labs/FLUX.1-dev
7
+ tags:
8
+ - Beam-Search
9
+ - stable-diffusion
10
+ - Diffusers
11
+ - Latent-Space
12
+ - DiT
13
+ - Flux
14
+ pipeline_tag: text-to-image
15
+ ---
16
+ # BeamDiffusion: Latent Beam Diffusion Models for Decoding Image Sequences
17
+ **BeamDiffusion** introduces a novel approach for generating coherent image sequences from text prompts by employing beam search in latent space. Unlike traditional methods that generate images independently, BeamDiffusion iteratively explores latent representations, ensuring smooth transitions and visual continuity across frames. A cross-attention mechanism efficiently scores and prunes search paths, optimizing both textual alignment and visual coherence.
18
+ BeamDiffusion addresses the challenge of maintaining visual consistency in image sequences generated from text prompts. By leveraging a beam search strategy in the latent space, it refines the generation process to produce sequences with enhanced coherence and alignment with textual descriptions, as outlined in the [paper](https://arxiv.org/abs/2503.20429).
19
+
20
+ ---
21
+ ## 🛠️ Setup Instructions
22
+
23
+ Before using BeamDiffusion, follow these steps to set up your environment:
24
+
25
+ ```bash
26
+ # 1. Create a virtual environment (recommended)
27
+ python3 -m venv beam_env
28
+
29
+ # 2. Activate the virtual environment
30
+ source beam_env/bin/activate # On macOS/Linux
31
+ # beam_env\Scripts\activate # On Windows
32
+
33
+ pip install huggingface_hub
34
+ huggingface-cli download Gui28F/BeamDiffusion2 --include requirements.txt --local-dir .
35
+
36
+ # 3. Install required dependencies
37
+ pip install -r ./requirements.txt
38
+ ```
39
+ ---
40
+ ## 🚀 Quickstart Guide
41
+
42
+ Here's a basic example of how to use BeamDiffusion with the `transformers` library to generate an image sequence based on a series of text prompts:
43
+
44
+ ```python
45
+ from huggingface_hub import snapshot_download
46
+
47
+ # Download the model snapshot
48
+ snapshot_download(repo_id="Gui28F/BeamDiffusion", local_dir="BeamDiffusionModel")
49
+ from BeamDiffusionModel.beam_diffusion import BeamDiffusionPipeline, BeamDiffusionConfig, BeamDiffusionModel
50
+
51
+ # Initialize the configuration, model, and pipeline
52
+ config = BeamDiffusionConfig(sd="SD-2.1", latents_idx=[0, 1, 2, 3], n_seeds=4, steps_back=2, beam_width=2, window_size=2, use_rand=True)
53
+ model = BeamDiffusionModel(config)
54
+ pipe = BeamDiffusionPipeline(model)
55
+
56
+ # Define the input parameters
57
+ input_data = {
58
+ "steps": ["A lively outdoor celebration with guests gathered around, everyone excited to support the event.",
59
+ "A chef in a cooking uniform raises one hand dramatically, signaling it's time to serve the food.",
60
+ "Guests chat and laugh in a vibrant setting, with people gathered around tables, enjoying the event."],
61
+ }
62
+
63
+ # Generate the sequence of images
64
+ sequence_imgs = pipe(input_data)
65
+ ```
66
+
67
+ **Result:**
68
+
69
+ ![Generated Image Sequence](./example.png)
70
+
71
+ ## 🔍 Input Parameters Explained
72
+
73
+ - **`sd`** (`str`): The base model to use for image generation. The available options are `SD-2.1` and `flux`.
74
+
75
+ - **`steps`** (`list of strings`): Descriptions for each step in the image generation process. The model generates one image per step, forming a sequence that aligns with these descriptions.
76
+
77
+ - **`latents_idx`** (`list of integers`): Indices referring to specific positions in the latent space to be used during image generation. This allows the model to leverage different latent representations for diverse outputs.
78
+
79
+ - **`n_seeds`** (`int`): Number of random seeds to use for the generation process. Each seed provides a different starting point for the randomness in the first step, influencing the diversity of generated sequences.
80
+
81
+ - **`seeds`** (`list of integers`): Specific seeds to use for the generation process. If provided, these seeds override the `n_seeds` parameter, allowing for controlled randomness.
82
+
83
+ - **`steps_back`** (`int`): Number of previous steps to consider during the beam search process. This parameter helps refine the current generation by incorporating information from earlier steps.
84
+
85
+ - **`beam_width`** (`int`): Number of candidate sequences to maintain during inference. Beam search evaluates multiple potential outputs and keeps the most probable ones based on the defined criteria.
86
+
87
+ - **`window_size`** (`int`): Size of the "window" for beam search pruning. Determines after how many steps pruning starts, helping the model focus on more probable options as the generation progresses.
88
+
89
+ - **`use_rand`** (`bool`): Flag to introduce randomness in the inference process. If set to `True`, the model generates more varied and creative results; if `False`, it produces more deterministic outputs.
90
+
91
+ ## 📚 Citation
92
+
93
+ If you use BeamDiffusion in your research or projects, please cite the following paper:
94
+
95
+ ```
96
+ @misc{fernandes2025latentbeamdiffusionmodels,
97
+ title={Latent Beam Diffusion Models for Generating Visual Sequences},
98
+ author={Guilherme Fernandes and Vasco Ramos and Regev Cohen and Idan Szpektor and João Magalhães},
99
+ year={2025},
100
+ eprint={2503.20429},
101
+ archivePrefix={arXiv},
102
+ primaryClass={cs.CV},
103
+ url={https://arxiv.org/abs/2503.20429},
104
+ }
105
+ ```