view article Article TimeScope: How Long Can Your Video Large Multimodal Model Go? By orrzohar and 3 others • 8 days ago • 30
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis Paper • 2503.23145 • Published Mar 29 • 36
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper • 2503.13399 • Published Mar 17 • 22
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 23
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published Jan 23 • 23
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 56
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published Jan 13 • 56
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published Jan 6 • 7
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration Paper • 2412.13180 • Published Dec 17, 2024 • 13
Action Sensitivity Learning for Temporal Action Localization Paper • 2305.15701 • Published May 25, 2023
Whitening-based Contrastive Learning of Sentence Embeddings Paper • 2305.17746 • Published May 28, 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models Paper • 2305.18010 • Published May 29, 2023
Describing Differences in Image Sets with Natural Language Paper • 2312.02974 • Published Dec 5, 2023 • 16
Clustering based Point Cloud Representation Learning for 3D Analysis Paper • 2307.14605 • Published Jul 27, 2023
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery Paper • 2307.16377 • Published Jul 31, 2023