view article Article Fast LoRA inference for Flux with Diffusers and PEFT By sayakpaul and 1 other • 5 days ago • 23
view article Article Mixture of Experts Explained By osanseviero and 5 others • Dec 11, 2023 • 771
Running 2.84k 2.84k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs By wenhuach and 8 others • Apr 29 • 37
view article Article Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs By wenhuach and 8 others • Apr 29 • 37
view article Article Accelerating LLM Inference with TGI on Intel Gaudi By baptistecolle and 4 others • Mar 28 • 14
view article Article Accelerating LLM Inference with TGI on Intel Gaudi By baptistecolle and 4 others • Mar 28 • 14
view article Article Benchmarking Language Model Performance on 5th Gen Xeon at GCP By MatrixYao and 2 others • Dec 17, 2024 • 6
view article Article Accelerating Protein Language Model ProtST on Intel Gaudi 2 By juliensimon and 7 others • Jul 3, 2024 • 2
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon By juliensimon and 8 others • May 9, 2024 • 12
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon By juliensimon and 8 others • May 9, 2024 • 12
Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length Paper • 2111.09645 • Published Nov 18, 2021
Intel/distilbert-base-uncased-sparse-90-unstructured-pruneofa Fill-Mask • Updated Apr 11, 2023 • 11 • 2