FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference Paper • 2505.22758 • Published May 28
PaTH Attention: Position Encoding via Accumulating Householder Transformations Paper • 2505.16381 • Published May 22
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published Feb 14
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping Paper • 2501.06589 • Published Jan 11
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models Paper • 2409.04787 • Published Sep 7, 2024 • 1
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 25
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 16
The infrastructure powering IBM's Gen AI model development Paper • 2407.05467 • Published Jul 7, 2024 • 2
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21, 2024 • 34
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization Paper • 2404.03605 • Published Apr 4, 2024 • 1
Granite Code Models: A Family of Open Foundation Models for Code Intelligence Paper • 2405.04324 • Published May 7, 2024 • 23
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 32
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models Paper • 2404.05567 • Published Apr 8, 2024 • 10
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 43
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback Paper • 2402.02479 • Published Feb 4, 2024 • 2
Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog Paper • 2210.07295 • Published Oct 13, 2022 • 1
Variational Learning for Unsupervised Knowledge Grounded Dialogs Paper • 2112.00653 • Published Nov 23, 2021 • 1
Variational Inference with Latent Space Quantization for Adversarial Resilience Paper • 1903.09940 • Published Mar 24, 2019 • 1