SPRIGHT-T2I
/

spright-t2i-sd2

@@ -4,10 +4,26 @@ library_name: diffusers
 # SPRIGHT-T2I Model Card
-The SPRIGHT-T2I model is a text-to-image diffusion model with high spatial coherency. It was first introduced in [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://), authored by Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo,
-Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, and Yezhou Yang.
-SPRIGHT-T2I model was finetuned from stable diffusion v2.1 on a subset of the [SPRIGHT dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright), which contains images and spatially focused captions. Leveraging SPRIGHT, along with efficient training techniques, we achieve state-of-the art performance in generating spatially accurate images from text.
 The training code and more details available in [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I).
@@ -29,7 +45,7 @@ Use SPRIGHT-T2I with 🧨 [`diffusers`](https://huggingface.co/SPRIGHT-T2I/sprig
 Use the code below to run SPRIGHT-T2I seamlessly and effectively on [🤗's Diffusers library](https://github.com/huggingface/diffusers) .
 ```bash
-pip install diffusers transformers accelerate scipy safetensors
 ```
 Running the pipeline:
@@ -51,10 +67,15 @@ image = pipe(prompt).images[0]
 image.save("kitten_sittin_in_a_dish.png")
 ```
-<img src="kitten_sitting_in_a_dish.png" width="300" alt="img">
 Additional examples that emphasize spatial coherence:
-<img src="result_images/visor.png" width="1000" alt="img">
 ## Bias and Limitations
@@ -103,16 +124,14 @@ Our key findings are:
  - Improve on all aspects of the VISOR score while improving the ZS-FID and CMMD score on COCO-30K images by 23.74% and 51.69%, respectively
  - Enhance the ability to generate 1 and 2 objects, along with generating the correct number of objects, as indicated by evaluation on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
-### Model Sources
 - **Repository:** [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I)
 - **Paper:** [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://)
 - **Demo:** [SPRIGHT-T2I on Spaces](https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I)
 ## Citation
 Coming soon

 # SPRIGHT-T2I Model Card
+The SPRIGHT-T2I model is a text-to-image diffusion model with high spatial coherency. It was first introduced in [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://),
+authored by Agneet Chatterjee<sup>\*</sup>, Gabriela Ben Melech Stan<sup>*</sup>, Estelle Aflalo, Sayak Paul, Dhruba Ghosh,
+Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, and Yezhou Yang.
+_(<sup>*</sup> denotes equal contributions)_
+SPRIGHT-T2I model was finetuned from [Stable Diffusion v2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) on a subset
+of the [SPRIGHT dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright), which contains images and spatially focused
+captions. Leveraging SPRIGHT, along with efficient training techniques, we achieve state-of-the art
+performance in generating spatially accurate images from text.
+## Table of contents
+* [Model details](#model-details)
+* [Usage](#usage)
+* [Bias and Limitations](#bias-and-limitations)
+* [Training](#training)
+* [Evaluation](#evaluation)
+* [Model Resources](#model-resources)
+* [Citation](#citation)
 The training code and more details available in [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I).
 Use the code below to run SPRIGHT-T2I seamlessly and effectively on [🤗's Diffusers library](https://github.com/huggingface/diffusers) .
 ```bash
+pip install diffusers transformers accelerate -U
 ```
 Running the pipeline:
 image.save("kitten_sittin_in_a_dish.png")
 ```
+<div align="center">
+  <img src="kitten_sitting_in_a_dish.png" width="300" alt="img">
+</div><be>
 Additional examples that emphasize spatial coherence:
+<div align="center">
+  <img src="result_images/visor.png" width="1000" alt="img">
+</div><br>
 ## Bias and Limitations
  - Improve on all aspects of the VISOR score while improving the ZS-FID and CMMD score on COCO-30K images by 23.74% and 51.69%, respectively
  - Enhance the ability to generate 1 and 2 objects, along with generating the correct number of objects, as indicated by evaluation on the [GenEval](https://github.com/djghosh13/geneval) benchmark.
+### Model Resources
+- **Dataset**: [SPRIGHT Dataset](https://huggingface.co/datasets/SPRIGHT-T2I/spright)
 - **Repository:** [SPRIGHT-T2I GitHub Repository](https://github.com/orgs/SPRIGHT-T2I)
 - **Paper:** [Getting it Right: Improving Spatial Consistency in Text-to-Image Models](https://)
 - **Demo:** [SPRIGHT-T2I on Spaces](https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I)
+- **Project Website**: [SPRIGHT Website](https://spright.github.io/)
 ## Citation
 Coming soon