Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published Jun 24 • 44
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19 • 86
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 58
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23 • 56
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper • 2506.18882 • Published Jun 23 • 85
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI Paper • 2505.19443 • Published May 26 • 15
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published May 25 • 146
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 94
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency Paper • 2504.18589 • Published Apr 24 • 13
Towards a Unified Copernicus Foundation Model for Earth Vision Paper • 2503.11849 • Published Mar 14 • 4
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published Apr 11 • 40
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation Paper • 2501.17433 • Published Jan 29 • 9
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published Feb 3 • 17
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models Paper • 2502.00698 • Published Feb 2 • 24
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published Jan 14 • 17