AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent Paper β’ 2404.03648 β’ Published Apr 4, 2024 β’ 29 β’ 7
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper β’ 2307.01952 β’ Published Jul 4, 2023 β’ 87 β’ 9
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant Paper β’ 2410.15316 β’ Published Oct 20, 2024 β’ 12 β’ 5
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Paper β’ 2208.12242 β’ Published Aug 25, 2022 β’ 12 β’ 12
Learning Flow Fields in Attention for Controllable Person Image Generation Paper β’ 2412.08486 β’ Published Dec 11, 2024 β’ 37 β’ 6
Learning Flow Fields in Attention for Controllable Person Image Generation Paper β’ 2412.08486 β’ Published Dec 11, 2024 β’ 37 β’ 6
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding Paper β’ 2401.04575 β’ Published Jan 9, 2024 β’ 17 β’ 4
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper β’ 2410.12705 β’ Published Oct 16, 2024 β’ 33 β’ 3
Guiding a Diffusion Model with a Bad Version of Itself Paper β’ 2406.02507 β’ Published Jun 4, 2024 β’ 17 β’ 1
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots Paper β’ 2406.02523 β’ Published Jun 4, 2024 β’ 12 β’ 1