-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 12 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 6 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 23
Inui
Norm
AI & ML interests
Video Diffusion; Large Language Model; Object Detection; OCR
Recent Activity
upvoted
a
paper
2 days ago
Less is More: Recursive Reasoning with Tiny Networks
liked
a model
22 days ago
rednote-hilab/dots.ocr
liked
a model
about 1 month ago
meituan-longcat/LongCat-Flash-Chat
Organizations
TI2V Research
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 39 -
AtomoVideo: High Fidelity Image-to-Video Generation
Paper • 2403.01800 • Published • 23 -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 57 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 24
Multimodal Language Model
What does matter besides data receipt when training a Multimodal language model?
Language Model
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 9 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 9 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 418
Open Datasets
Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.
Video2Video
Image / Video Gen
Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion
-
Understanding Diffusion Models: A Unified Perspective
Paper • 2208.11970 • Published -
Tutorial on Diffusion Models for Imaging and Vision
Paper • 2403.18103 • Published • 2 -
Denoising Diffusion Probabilistic Models
Paper • 2006.11239 • Published • 6 -
Denoising Diffusion Implicit Models
Paper • 2010.02502 • Published • 4
Fundamental Research
-
Scaling Law with Learning Rate Annealing
Paper • 2408.11029 • Published • 4 -
Token Turing Machines
Paper • 2211.09119 • Published • 1 -
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Paper • 2203.12602 • Published -
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Paper • 2305.13035 • Published
Computer Vision
Do we still need a network for specific computer vision tasks anymore today?
VAE
-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 12 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 6 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 23
Video2Video
TI2V Research
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper • 2408.06072 • Published • 39 -
AtomoVideo: High Fidelity Image-to-Video Generation
Paper • 2403.01800 • Published • 23 -
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
Paper • 2411.04928 • Published • 57 -
AnimateAnything: Consistent and Controllable Animation for Video Generation
Paper • 2411.10836 • Published • 24
Image / Video Gen
Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion
-
Understanding Diffusion Models: A Unified Perspective
Paper • 2208.11970 • Published -
Tutorial on Diffusion Models for Imaging and Vision
Paper • 2403.18103 • Published • 2 -
Denoising Diffusion Probabilistic Models
Paper • 2006.11239 • Published • 6 -
Denoising Diffusion Implicit Models
Paper • 2010.02502 • Published • 4
Multimodal Language Model
What does matter besides data receipt when training a Multimodal language model?
Fundamental Research
-
Scaling Law with Learning Rate Annealing
Paper • 2408.11029 • Published • 4 -
Token Turing Machines
Paper • 2211.09119 • Published • 1 -
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Paper • 2203.12602 • Published -
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Paper • 2305.13035 • Published
Language Model
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 9 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 9 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 418
Computer Vision
Do we still need a network for specific computer vision tasks anymore today?
Open Datasets
Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.