Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Organizations

authored 2 papers 5 months ago

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Paper • 2505.22758 • Published May 28

PaTH Attention: Position Encoding via Accumulating Householder Transformations

Paper • 2505.16381 • Published May 22

authored 2 papers 8 months ago

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

Paper • 2502.09927 • Published Feb 14

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Paper • 2501.06589 • Published Jan 11

authored a paper 11 months ago

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Paper • 2409.04787 • Published Sep 7, 2024 • 1

authored a paper about 1 year ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23, 2024 • 24

authored 11 papers over 1 year ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12, 2024 • 17

Scaling Granite Code Models to 128K Context

Paper • 2407.13739 • Published Jul 18, 2024 • 20

The infrastructure powering IBM's Gen AI model development

Paper • 2407.05467 • Published Jul 7, 2024 • 2

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33

Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization

Paper • 2404.03605 • Published Apr 4, 2024 • 1

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Paper • 2405.04324 • Published May 7, 2024 • 25

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 34

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Paper • 2404.05567 • Published Apr 8, 2024 • 10

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30, 2024 • 42

BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

Paper • 2402.02479 • Published Feb 4, 2024 • 2

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 149

authored 3 papers almost 2 years ago

Joint Reasoning on Hybrid-knowledge sources for Task-Oriented Dialog

Paper • 2210.07295 • Published Oct 13, 2022 • 1

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Paper • 2112.00653 • Published Nov 23, 2021 • 1

Variational Inference with Latent Space Quantization for Adversarial Resilience

Paper • 1903.09940 • Published Mar 24, 2019 • 1