Inui

Norm

Fireblossom's profile picture

lwtwl's profile picture

21world's profile picture

https://normxu.github.io/

AI & ML interests

Video Diffusion; Large Language Model; Object Detection; OCR

Recent Activity

liked a dataset 14 days ago

NoobEngineere/NSFW_Manga

upvoted a paper about 1 month ago

LongCat-Flash-Thinking-2601 Technical Report

liked a model about 2 months ago

meituan-longcat/LongCat-Flash-Thinking-2601

View all activity

Organizations

Norm 's collections 9

VAE

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper • 2411.17459 • Published Nov 26, 2024 • 12
MAGVIT: Masked Generative Video Transformer

Paper • 2212.05199 • Published Dec 10, 2022
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 6
Finite Scalar Quantization: VQ-VAE Made Simple

Paper • 2309.15505 • Published Sep 27, 2023 • 24

TI2V Research

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 38
AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4, 2024 • 23
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 56
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 24

Multimodal Language Model

What does matter besides data receipt when training a Multimodal language model?

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 41
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 72
openbmb/MiniCPM-V-2_6

Image-Text-to-Text • Updated Jun 13, 2025 • 106k • 1.03k

Language Model

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 441

Open Datasets

Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.

vikhyatk/lnqa

Viewer • Updated Aug 18, 2024 • 303k • 1.13k • 89
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14.4k • 231
naver-clova-ix/synthdog-en

Viewer • Updated Jan 31, 2024 • 66k • 567 • 25
Mutonix/Vript_Chinese

Viewer • Updated Oct 16, 2024 • 294k • 703 • 16

Video2Video

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26, 2024 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 9
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4

Fundamental Research

Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20, 2024 • 4
Token Turing Machines

Paper • 2211.09119 • Published Nov 16, 2022 • 1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper • 2203.12602 • Published Mar 23, 2022 • 1
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Paper • 2305.13035 • Published May 22, 2023

Computer Vision

Do we still need a network for specific computer vision tasks anymore today?

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120
facebook/sam2.1-hiera-large

Mask Generation • 0.2B • Updated Aug 15, 2025 • 95.5k • 129

VAE

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Paper • 2411.17459 • Published Nov 26, 2024 • 12
MAGVIT: Masked Generative Video Transformer

Paper • 2212.05199 • Published Dec 10, 2022
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 6
Finite Scalar Quantization: VQ-VAE Made Simple

Paper • 2309.15505 • Published Sep 27, 2023 • 24

Video2Video

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 31

TI2V Research

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12, 2024 • 38
AtomoVideo: High Fidelity Image-to-Video Generation

Paper • 2403.01800 • Published Mar 4, 2024 • 23
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7, 2024 • 56
AnimateAnything: Consistent and Controllable Animation for Video Generation

Paper • 2411.10836 • Published Nov 16, 2024 • 24

Image / Video Gen

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion

Understanding Diffusion Models: A Unified Perspective

Paper • 2208.11970 • Published Aug 25, 2022
Tutorial on Diffusion Models for Imaging and Vision

Paper • 2403.18103 • Published Mar 26, 2024 • 2
Denoising Diffusion Probabilistic Models

Paper • 2006.11239 • Published Jun 19, 2020 • 9
Denoising Diffusion Implicit Models

Paper • 2010.02502 • Published Oct 6, 2020 • 4

Multimodal Language Model

What does matter besides data receipt when training a Multimodal language model?

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6, 2024 • 61
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 41
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 72
openbmb/MiniCPM-V-2_6

Image-Text-to-Text • Updated Jun 13, 2025 • 106k • 1.03k

Fundamental Research

Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20, 2024 • 4
Token Turing Machines

Paper • 2211.09119 • Published Nov 16, 2022 • 1
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper • 2203.12602 • Published Mar 23, 2022 • 1
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Paper • 2305.13035 • Published May 22, 2023

Language Model

STaR: Bootstrapping Reasoning With Reasoning

Paper • 2203.14465 • Published Mar 28, 2022 • 9
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 441

Computer Vision

Do we still need a network for specific computer vision tasks anymore today?

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 120
facebook/sam2.1-hiera-large

Mask Generation • 0.2B • Updated Aug 15, 2025 • 95.5k • 129

Open Datasets

Thank you for sharing your dataset. I’ve fed them to my model, and they are benefit to it.

vikhyatk/lnqa

Viewer • Updated Aug 18, 2024 • 303k • 1.13k • 89
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 14.4k • 231
naver-clova-ix/synthdog-en

Viewer • Updated Jan 31, 2024 • 66k • 567 • 25
Mutonix/Vript_Chinese

Viewer • Updated Oct 16, 2024 • 294k • 703 • 16