Transformers Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Multiplexer MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 22
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 22
Transformers Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers Paper • 2410.13184 • Published Oct 17, 2024 • 3
Multiplexer MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 22
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 22