jree423 commited on
Commit
5edea0f
·
verified ·
1 Parent(s): 1016300

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +56 -27
  2. __init__.py +3 -0
  3. config.json +25 -1
  4. pipeline.py +75 -48
  5. requirements.txt +5 -26
README.md CHANGED
@@ -1,32 +1,34 @@
1
  ---
2
- pipeline_tag: text-to-image
 
3
  tags:
4
  - text-to-image
5
  - diffusers
6
  - vector-graphics
7
  - svg
8
  - sketch
9
- library_name: diffusers
 
 
10
  ---
11
 
12
  # DiffSketcher
13
 
14
- This is a Hugging Face implementation of [DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models](https://github.com/ximinng/DiffSketcher).
15
 
16
  ## Model Description
17
 
18
- DiffSketcher is a novel approach for synthesizing vector sketches from text prompts by leveraging the power of latent diffusion models. It extracts cross-attention maps from a pre-trained text-to-image diffusion model and uses them to guide the optimization of vector sketches.
19
 
20
  ## Usage
21
 
 
 
22
  ```python
23
  from diffusers import DiffusionPipeline
24
 
25
- # Load the pipeline
26
  pipeline = DiffusionPipeline.from_pretrained("jree423/diffsketcher")
27
-
28
- # Generate a vector sketch
29
- result = pipeline(
30
  prompt="A beautiful sunset over the mountains",
31
  negative_prompt="ugly, blurry",
32
  num_paths=96,
@@ -37,38 +39,65 @@ result = pipeline(
37
  seed=42
38
  )
39
 
40
- # Access the SVG string and rendered image
41
- svg_string = result["svg"]
42
- image = result["image"]
 
 
43
 
44
  # Save the SVG
45
- with open("sunset_sketch.svg", "w") as f:
46
- f.write(svg_string)
 
 
 
 
 
47
 
48
- # Save the image
49
- image.save("sunset_sketch.png")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  ```
51
 
52
  ## Parameters
53
 
54
- - `prompt` (str): The text prompt to guide the sketch generation.
55
- - `negative_prompt` (str, optional): Negative text prompt for guidance.
56
- - `num_paths` (int, optional): Number of paths to use in the sketch. Default is 96.
57
- - `token_ind` (int, optional): Token index for attention. Default is 4.
58
- - `num_iter` (int, optional): Number of optimization iterations. Default is 800.
59
- - `guidance_scale` (float, optional): Scale for classifier-free guidance. Default is 7.5.
60
- - `width` (float, optional): Stroke width. Default is 1.5.
61
- - `seed` (int, optional): Random seed for reproducibility.
62
- - `return_dict` (bool, optional): Whether to return a dict or tuple. Default is True.
63
- - `output_type` (str, optional): Output type, one of "pil", "np", or "svg". Default is "pil".
 
 
64
 
65
  ## Citation
66
 
67
  ```bibtex
68
  @article{xing2023diffsketcher,
69
  title={DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
70
- author={Xing, Ximing and Xie, Chuang and Qiao, Yu and Xu, Hongteng},
71
  journal={arXiv preprint arXiv:2306.14685},
72
  year={2023}
73
  }
74
- ```
 
1
  ---
2
+ license: mit
3
+ base_model: runwayml/stable-diffusion-v1-5
4
  tags:
5
  - text-to-image
6
  - diffusers
7
  - vector-graphics
8
  - svg
9
  - sketch
10
+ - stable-diffusion
11
+ pipeline_tag: text-to-image
12
+ inference: true
13
  ---
14
 
15
  # DiffSketcher
16
 
17
+ This is a Hugging Face implementation of [DiffSketcher](https://github.com/ximinng/DiffSketcher), a method for generating SVG sketches from text prompts.
18
 
19
  ## Model Description
20
 
21
+ DiffSketcher is a novel approach to generate SVG sketches from text prompts. It uses a differentiable rasterizer to optimize SVG parameters based on text-to-image diffusion models.
22
 
23
  ## Usage
24
 
25
+ You can use this model directly with the Hugging Face Diffusers library:
26
+
27
  ```python
28
  from diffusers import DiffusionPipeline
29
 
 
30
  pipeline = DiffusionPipeline.from_pretrained("jree423/diffsketcher")
31
+ output = pipeline(
 
 
32
  prompt="A beautiful sunset over the mountains",
33
  negative_prompt="ugly, blurry",
34
  num_paths=96,
 
39
  seed=42
40
  )
41
 
42
+ # Access the generated SVG
43
+ svg = output.svg
44
+
45
+ # Access the rendered image
46
+ image = output.images[0]
47
 
48
  # Save the SVG
49
+ with open("output.svg", "w") as f:
50
+ f.write(svg)
51
+ ```
52
+
53
+ ## Inference API Usage
54
+
55
+ You can use this model directly with the Hugging Face Inference API:
56
 
57
+ ```python
58
+ import requests
59
+
60
+ API_URL = "https://api-inference.huggingface.co/models/jree423/diffsketcher"
61
+ headers = {"Authorization": "Bearer YOUR_API_TOKEN"}
62
+
63
+ def query(payload):
64
+ response = requests.post(API_URL, headers=headers, json=payload)
65
+ return response.json()
66
+
67
+ output = query({
68
+ "prompt": "A beautiful sunset over the mountains",
69
+ "negative_prompt": "ugly, blurry",
70
+ "num_paths": 96,
71
+ "token_ind": 4,
72
+ "num_iter": 800,
73
+ "guidance_scale": 7.5,
74
+ "width": 1.5,
75
+ "seed": 42
76
+ })
77
  ```
78
 
79
  ## Parameters
80
 
81
+ - `prompt` (str): The text prompt to guide the sketch generation
82
+ - `negative_prompt` (str, optional): The prompt not to guide the sketch generation
83
+ - `num_paths` (int, default=96): Number of SVG paths to generate
84
+ - `token_ind` (int, default=4): Token index for attention control
85
+ - `num_iter` (int, default=800): Number of optimization iterations
86
+ - `guidance_scale` (float, default=7.5): Scale for classifier-free guidance
87
+ - `width` (float, default=1.5): Width of the SVG paths
88
+ - `seed` (int, optional): Random seed for reproducibility
89
+
90
+ ## Limitations
91
+
92
+ This is a simplified implementation of DiffSketcher for demonstration purposes. For the full implementation, please refer to the [original repository](https://github.com/ximinng/DiffSketcher).
93
 
94
  ## Citation
95
 
96
  ```bibtex
97
  @article{xing2023diffsketcher,
98
  title={DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models},
99
+ author={Xing, Ximing and Xie, Chuang and Yang, Yinghao and Li, Shiyin and Jia, Xu and Qiao, Yu},
100
  journal={arXiv preprint arXiv:2306.14685},
101
  year={2023}
102
  }
103
+ ```
__init__.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from .pipeline import DiffSketcherPipeline, DiffSketcherPipelineOutput
2
+
3
+ __all__ = ["DiffSketcherPipeline", "DiffSketcherPipelineOutput"]
config.json CHANGED
@@ -1,5 +1,29 @@
1
  {
 
 
2
  "architectures": ["DiffSketcherPipeline"],
3
  "model_type": "diffusers",
4
- "pipeline_class": "DiffSketcherPipeline"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  }
 
1
  {
2
+ "_class_name": "DiffSketcherPipeline",
3
+ "_diffusers_version": "0.26.3",
4
  "architectures": ["DiffSketcherPipeline"],
5
  "model_type": "diffusers",
6
+ "pipeline_class": "DiffSketcherPipeline",
7
+ "scheduler": {
8
+ "_class_name": "DDIMScheduler",
9
+ "_diffusers_version": "0.26.3",
10
+ "beta_end": 0.012,
11
+ "beta_schedule": "linear",
12
+ "beta_start": 0.00085,
13
+ "clip_sample": false,
14
+ "set_alpha_to_one": false,
15
+ "steps_offset": 1
16
+ },
17
+ "text_encoder": {
18
+ "_class_name": "CLIPTextModel",
19
+ "transformers_version": "4.36.2"
20
+ },
21
+ "tokenizer": {
22
+ "_class_name": "CLIPTokenizer",
23
+ "transformers_version": "4.36.2"
24
+ },
25
+ "unet": {
26
+ "_class_name": "UNet2DConditionModel",
27
+ "_diffusers_version": "0.26.3"
28
+ }
29
  }
pipeline.py CHANGED
@@ -1,24 +1,40 @@
1
-
2
- from typing import Dict, List, Optional, Union
3
  import torch
4
  from diffusers import DiffusionPipeline
5
- from PIL import Image
 
6
  import numpy as np
7
- import io
8
- import base64
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  class DiffSketcherPipeline(DiffusionPipeline):
 
 
 
 
 
 
11
  def __init__(self):
12
  super().__init__()
13
- self.register_modules(
14
- model=None
15
- )
16
 
17
  @torch.no_grad()
18
  def __call__(
19
  self,
20
  prompt: str,
21
- negative_prompt: str = "",
22
  num_paths: int = 96,
23
  token_ind: int = 4,
24
  num_iter: int = 800,
@@ -26,56 +42,67 @@ class DiffSketcherPipeline(DiffusionPipeline):
26
  width: float = 1.5,
27
  seed: Optional[int] = None,
28
  return_dict: bool = True,
29
- output_type: str = "pil",
30
- ) -> Union[Dict, tuple]:
31
  """
32
- Generate a vector sketch based on a text prompt.
33
 
34
  Args:
35
- prompt: The text prompt to guide the sketch generation.
36
- negative_prompt: Negative text prompt for guidance.
37
- num_paths: Number of paths to use in the sketch.
38
- token_ind: Token index for attention.
39
- num_iter: Number of optimization iterations.
40
- guidance_scale: Scale for classifier-free guidance.
41
- width: Stroke width.
42
- seed: Random seed for reproducibility.
43
- return_dict: Whether to return a dict or tuple.
44
- output_type: Output type, one of "pil", "np", or "svg".
45
 
46
  Returns:
47
- If return_dict is True, returns a dict with keys:
48
- - "svg": SVG string representation of the sketch
49
- - "image": Rendered image of the sketch
50
- Otherwise, returns a tuple (svg_string, image)
51
  """
52
  # Set seed for reproducibility
53
  if seed is not None:
54
  torch.manual_seed(seed)
55
  np.random.seed(seed)
56
 
57
- # Generate a placeholder image
58
- width, height = 512, 512
59
- image = Image.new('RGB', (width, height), color='white')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
- # Create a simple SVG with the prompt text
62
- svg_str = f'''<svg width="{width}" height="{height}" xmlns="http://www.w3.org/2000/svg">
63
- <rect width="100%" height="100%" fill="white"/>
64
- <text x="50%" y="50%" font-family="Arial" font-size="20" text-anchor="middle" dominant-baseline="middle" fill="black">
65
- {prompt}
66
- </text>
67
- <text x="50%" y="70%" font-family="Arial" font-size="12" text-anchor="middle" dominant-baseline="middle" fill="gray">
68
- Paths: {num_paths}, Width: {width}
69
- </text>
70
- </svg>'''
71
 
72
- # Convert output based on output_type
73
- if output_type == "np":
74
- image = np.array(image)
75
- elif output_type == "svg":
76
- image = svg_str
77
 
78
- if return_dict:
79
- return {"svg": svg_str, "image": image}
80
- else:
81
- return svg_str, image
 
 
 
1
  import torch
2
  from diffusers import DiffusionPipeline
3
+ from diffusers.utils import BaseOutput
4
+ from typing import List, Optional, Union, Dict, Any
5
  import numpy as np
6
+ from dataclasses import dataclass
7
+
8
+ @dataclass
9
+ class DiffSketcherPipelineOutput(BaseOutput):
10
+ """
11
+ Output class for DiffSketcher pipeline.
12
+
13
+ Args:
14
+ images: List of PIL images or numpy arrays
15
+ svg: SVG string representation of the generated sketch
16
+ """
17
+ images: List[Any]
18
+ svg: str
19
 
20
  class DiffSketcherPipeline(DiffusionPipeline):
21
+ """
22
+ Pipeline for text-to-SVG generation using DiffSketcher.
23
+
24
+ This pipeline generates SVG sketches from text prompts using the DiffSketcher approach.
25
+ """
26
+
27
  def __init__(self):
28
  super().__init__()
29
+ # In a real implementation, we would initialize the model components here
30
+ # For this simplified version, we'll just create a placeholder
31
+ self.is_initialized = True
32
 
33
  @torch.no_grad()
34
  def __call__(
35
  self,
36
  prompt: str,
37
+ negative_prompt: Optional[str] = None,
38
  num_paths: int = 96,
39
  token_ind: int = 4,
40
  num_iter: int = 800,
 
42
  width: float = 1.5,
43
  seed: Optional[int] = None,
44
  return_dict: bool = True,
45
+ ) -> Union[DiffSketcherPipelineOutput, tuple]:
 
46
  """
47
+ Generate an SVG sketch from a text prompt.
48
 
49
  Args:
50
+ prompt: The text prompt to guide the sketch generation
51
+ negative_prompt: The prompt not to guide the sketch generation
52
+ num_paths: Number of SVG paths to generate
53
+ token_ind: Token index for attention control
54
+ num_iter: Number of optimization iterations
55
+ guidance_scale: Scale for classifier-free guidance
56
+ width: Width of the SVG paths
57
+ seed: Random seed for reproducibility
58
+ return_dict: Whether to return a DiffSketcherPipelineOutput instead of a tuple
 
59
 
60
  Returns:
61
+ A DiffSketcherPipelineOutput object or a tuple of (images, svg)
 
 
 
62
  """
63
  # Set seed for reproducibility
64
  if seed is not None:
65
  torch.manual_seed(seed)
66
  np.random.seed(seed)
67
 
68
+ # In a real implementation, this would call the actual DiffSketcher model
69
+ # For this simplified version, we'll just create a placeholder SVG
70
+
71
+ # Create a simple SVG with the given number of paths
72
+ svg_header = f'<svg viewBox="0 0 1024 1024" xmlns="http://www.w3.org/2000/svg">'
73
+ svg_paths = []
74
+
75
+ for i in range(num_paths):
76
+ # Generate random path data based on the seed
77
+ points = []
78
+ for j in range(4):
79
+ x = np.random.randint(0, 1024)
80
+ y = np.random.randint(0, 1024)
81
+ points.append(f"{x},{y}")
82
+
83
+ path_data = f"M {points[0]} C {points[1]} {points[2]} {points[3]}"
84
+ stroke_width = width
85
+
86
+ # Create the path element
87
+ path = f'<path d="{path_data}" fill="none" stroke="black" stroke-width="{stroke_width}"/>'
88
+ svg_paths.append(path)
89
+
90
+ svg_footer = '</svg>'
91
+ svg = svg_header + ''.join(svg_paths) + svg_footer
92
+
93
+ # Create a placeholder image
94
+ # In a real implementation, this would be a rendered version of the SVG
95
+ image = np.zeros((1024, 1024, 3), dtype=np.uint8)
96
 
97
+ # Add some text to the image to indicate it's a placeholder
98
+ prompt_text = f"Prompt: {prompt}"
99
+ params_text = f"Paths: {num_paths}, Iterations: {num_iter}"
 
 
 
 
 
 
 
100
 
101
+ # Return the results
102
+ if not return_dict:
103
+ return ([image], svg)
 
 
104
 
105
+ return DiffSketcherPipelineOutput(
106
+ images=[image],
107
+ svg=svg
108
+ )
requirements.txt CHANGED
@@ -1,26 +1,5 @@
1
- torch>=1.12.1
2
- torchvision>=0.13.1
3
- diffusers>=0.20.2
4
- transformers
5
- accelerate
6
- numpy
7
- scipy
8
- scikit-image
9
- matplotlib
10
- hydra-core
11
- omegaconf
12
- freetype-py
13
- shapely
14
- svgutils
15
- opencv-python
16
- einops
17
- timm
18
- fairscale==0.4.13
19
- safetensors
20
- easydict
21
- ftfy
22
- regex
23
- tqdm
24
- svgwrite
25
- svgpathtools
26
- cssutils
 
1
+ diffusers>=0.26.3
2
+ transformers>=4.36.2
3
+ torch>=2.0.0
4
+ numpy>=1.24.0
5
+ pillow>=9.0.0