Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory Paper • 2305.17144 • Published May 25, 2023 • 2
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision Paper • 2211.10439 • Published Nov 18, 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer Paper • 2207.09840 • Published Jul 20, 2022
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning Paper • 2406.07543 • Published Jun 11, 2024
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46
ZeroGUI Collection ZeroGUI: Automating Online GUI Learning at Zero Human Cost • 3 items • Updated May 30 • 2
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46 • 2
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Paper • 2412.09613 • Published Dec 12, 2024 • 1
ZeroGUI Collection ZeroGUI: Automating Online GUI Learning at Zero Human Cost • 3 items • Updated May 30 • 2
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models Paper • 2412.09613 • Published Dec 12, 2024 • 1