Yosuke Yamaguchi
yamayou
AI & ML interests
None yet
Recent Activity
updated
a collection
11 days ago
Idea
updated
a collection
3 months ago
Idea
updated
a collection
4 months ago
Idea
Organizations
None yet
LLM
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 42 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66
Idea
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 48 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 72 -
Humanoid Locomotion as Next Token Prediction
Paper • 2402.19469 • Published • 28
Multimodal
-
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 38 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 60
time series
Idea
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 48 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 72 -
Humanoid Locomotion as Next Token Prediction
Paper • 2402.19469 • Published • 28
LLM
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 111 -
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Paper • 2404.00399 • Published • 42 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 66
Multimodal
-
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34 -
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Paper • 2406.02430 • Published • 38 -
An Image is Worth 32 Tokens for Reconstruction and Generation
Paper • 2406.07550 • Published • 60