liu PRO
che111
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 2 months ago
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
upvoted
a
paper
about 2 months ago
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
upvoted
a
paper
2 months ago
Reverse-Engineered Reasoning for Open-Ended Generation
Organizations
Work for 3D Medical Vision
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 31 -
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Paper • 2405.15738 • Published • 46 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 100
Localize Viusal Understanding
-
GLaMM: Pixel Grounding Large Multimodal Model
Paper • 2311.03356 • Published • 37 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 15 -
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Paper • 2311.03354 • Published • 8 -
Language-Informed Visual Concept Learning
Paper • 2312.03587 • Published • 8
Synthetic Data Learning
General Multimodal Learning
-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 29 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 36 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 95
VideoForMed
-
Distilling Vision-Language Models on Millions of Videos
Paper • 2401.06129 • Published • 17 -
Koala: Key frame-conditioned long video-LLM
Paper • 2404.04346 • Published • 7 -
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Paper • 2404.05726 • Published • 23 -
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Paper • 2406.07471 • Published • 1
Med Multimodal Learning
Generative Model
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 17 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 74 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 31
Explaniable, Fairness Work
AlphaMed
VideoForMed
-
Distilling Vision-Language Models on Millions of Videos
Paper • 2401.06129 • Published • 17 -
Koala: Key frame-conditioned long video-LLM
Paper • 2404.04346 • Published • 7 -
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Paper • 2404.05726 • Published • 23 -
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Paper • 2406.07471 • Published • 1
Work for 3D Medical Vision
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 31 -
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Paper • 2405.15738 • Published • 46 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 100
Med Multimodal Learning
Localize Viusal Understanding
-
GLaMM: Pixel Grounding Large Multimodal Model
Paper • 2311.03356 • Published • 37 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 15 -
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Paper • 2311.03354 • Published • 8 -
Language-Informed Visual Concept Learning
Paper • 2312.03587 • Published • 8
Generative Model
-
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Paper • 2311.10708 • Published • 17 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
NVLM: Open Frontier-Class Multimodal LLMs
Paper • 2409.11402 • Published • 74 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 31
Synthetic Data Learning
Explaniable, Fairness Work
General Multimodal Learning
-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 29 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 36 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 95