2 12 5

Junfei Wu

Hyperwjf

Hyperwjf

AI & ML interests

None yet

Recent Activity

upvoted a paper 18 days ago

Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models

upvoted a paper about 1 month ago

Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

upvoted a paper about 2 months ago

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

View all activity

Organizations

upvoted a paper 18 days ago

Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models

Paper • 2511.09809 • Published 23 days ago • 4

upvoted a paper about 1 month ago

Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

Paper • 2510.24514 • Published Oct 28 • 21

upvoted a paper about 2 months ago

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Paper • 2510.10395 • Published Oct 12 • 29

upvoted a paper 2 months ago

PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

Paper • 2509.19894 • Published Sep 24 • 33

liked a dataset 3 months ago

AntResearchNLP/ViLaSR-data

Updated Jun 24 • 432 • 5

New activity in prithivMLmods/Multimodal-VLM-v1.0 3 months ago

Friendly note: ViLaSR relies on thinking with images, but the demo uses text-only reasoning

#4 opened 3 months ago by

Hyperwjf

liked a Space 3 months ago

Multimodal VLM v1.0

⚡

OCR, VQA, Thinking and Object Detection.

upvoted 2 papers 3 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 208

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19 • 118

liked a dataset 4 months ago

ShareGPTVideo/train_video_and_instruction

Updated Dec 14, 2024 • 2.83k • 29

updated a model 4 months ago

inclusionAI/ViLaSR

Image-Text-to-Text • 8B • Updated Aug 11 • 2.48k • 18

upvoted a paper 5 months ago

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Paper • 2506.07961 • Published Jun 9 • 11

liked a dataset 5 months ago

AI4Math/MathVista

Viewer • Updated Feb 11, 2024 • 6.14k • 14.9k • 194

updated a dataset 6 months ago

AntResearchNLP/ViLaSR-data

Updated Jun 24 • 432 • 5

New activity in lmms-lab/LLaVA-OneVision-Data 6 months ago

Are there only single images?

👍 3

#6 opened over 1 year ago by

duang

updated a dataset 6 months ago

AntResearchNLP/ViLaSR-eval

Preview • Updated Jun 23 • 57

updated a collection 6 months ago

ViLaSR

Collection

4 items • Updated Jun 22 • 1

published a dataset 6 months ago

AntResearchNLP/ViLaSR-eval

Preview • Updated Jun 23 • 57

liked a model 6 months ago

inclusionAI/ViLaSR

Image-Text-to-Text • 8B • Updated Aug 11 • 2.48k • 18

updated a model 6 months ago

AntResearchNLP/ViLaSR-cold-start

8B • Updated Jun 22 • 10