Igor Molybog
igormolybog
AI & ML interests
Optimization, Machine Learning
Organizations
None yet
Inference speed
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Paper • 2311.02849 • Published • 8 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119
Domain spec fine-tuning
Inference speed
-
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Co-training and Co-distillation for Quality Improvement and Compression of Language Models
Paper • 2311.02849 • Published • 8 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119
models
0
None public yet
datasets
0
None public yet