Yutao Zeng's picture

4 18

Yutao Zeng

Taoer

·

AI & ML interests

None yet

Recent Activity

authored a paper about 2 months ago

Virtual Width Networks

upvoted a paper about 2 months ago

Virtual Width Networks

authored a paper 4 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

View all activity

Organizations

authored a paper about 2 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 37

upvoted a paper about 2 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 37

authored a paper 4 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36

updated a collection 4 months ago

Full Paper List

11 items • Updated Aug 27, 2025 • 1

upvoted a paper 4 months ago

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36

updated a collection 7 months ago

Full Paper List

11 items • Updated Aug 27, 2025 • 1

authored a paper 7 months ago

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Paper • 2505.24452 • Published May 30, 2025 • 5

upvoted a paper 7 months ago

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Paper • 2505.24452 • Published May 30, 2025 • 5

commented a paper 7 months ago

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Paper • 2505.24452 • Published May 30, 2025 • 5 •

authored 2 papers 8 months ago

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10, 2025 • 4

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20, 2025 • 76

updated a collection 8 months ago

Full Paper List

11 items • Updated Aug 27, 2025 • 1

upvoted a paper 8 months ago

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20, 2025 • 76

authored a paper 9 months ago

Efficient Pretraining Length Scaling

Paper • 2504.14992 • Published Apr 21, 2025 • 20

upvoted a paper 9 months ago

Efficient Pretraining Length Scaling

Paper • 2504.14992 • Published Apr 21, 2025 • 20

updated a collection 9 months ago

Full Paper List

11 items • Updated Aug 27, 2025 • 1

updated 2 models 9 months ago

Open-Foundation-Models/PolyNorm_1B

Text Generation • Updated Apr 8, 2025 • 22

Open-Foundation-Models/PolyReLU_1B

Text Generation • Updated Apr 8, 2025 • 13

upvoted a paper 10 months ago

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Paper • 2503.16057 • Published Mar 20, 2025 • 14

authored a paper 10 months ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18, 2025 • 22