new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 24

Submitted by

Elizaveta

When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

·
3 authors

2

Submitted by

VentureZJ

MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

·
9 authors

2

Submitted by

dwzhu

A Comprehensive Survey on Long Context Language Modeling

·
37 authors

Submitted by

VentureZJ

MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization

·
6 authors

Submitted by

IranQin

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

·
8 authors

2

Submitted by

akhaliq

Modifying Large Language Model Post-Training for Diverse Creative Writing

·
5 authors

Submitted by

Epiphqny

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

·
7 authors

Submitted by

akhaliq

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

·
7 authors

3

Submitted by

ydeng9

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

·
6 authors

Submitted by

akhaliq

Enabling Versatile Controls for Video Diffusion Models

·
8 authors

Submitted by

JacobYuan

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

·
8 authors

Submitted by

yairshp

Single Image Iterative Subject-driven Generation and Editing

·
3 authors

Submitted by

akhaliq

FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models

·
7 authors

Submitted by

Guan123

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

·
8 authors

2

Submitted by

shuoxing

Can Large Vision Language Models Read Maps Like a Human?

·
9 authors

2

Submitted by

hitsmy

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

·
4 authors

2

Submitted by

leemessi

Implicit Bias-Like Patterns in Reasoning Models

·
2 authors

Submitted by

Master-Shi

PVChat: Personalized Video Chat with One-Shot Learning

·
9 authors

2

Submitted by

ChengmingX

When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO

·
8 authors

Submitted by

aritradutta

GAEA: A Geolocation Aware Conversational Model

·
6 authors

2

Submitted by

kwanY

FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields

·
4 authors

Submitted by

ZhaochongAn

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

·
7 authors