-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper ⢠2408.03314 ⢠Published ⢠64 -
Training Compute-Optimal Large Language Models
Paper ⢠2203.15556 ⢠Published ⢠10 -
Scaling Laws for Precision
Paper ⢠2411.04330 ⢠Published ⢠8 -
Transcending Scaling Laws with 0.1% Extra Compute
Paper ⢠2210.11399 ⢠Published
Collections
Discover the best community collections!
Collections including paper arxiv:2401.00448
-
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Paper ⢠2206.10789 ⢠Published ⢠4 -
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ⢠2401.00448 ⢠Published ⢠31 -
Training Compute-Optimal Large Language Models
Paper ⢠2203.15556 ⢠Published ⢠10 -
Scaling Laws for Neural Language Models
Paper ⢠2001.08361 ⢠Published ⢠7
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ⢠2401.00448 ⢠Published ⢠31 -
Improving Text Embeddings with Large Language Models
Paper ⢠2401.00368 ⢠Published ⢠82 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper ⢠2401.06951 ⢠Published ⢠27 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper ⢠2403.17887 ⢠Published ⢠81
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper ⢠2310.17680 ⢠Published ⢠73 -
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Paper ⢠2312.15685 ⢠Published ⢠16 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper ⢠2401.01055 ⢠Published ⢠56 -
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ⢠2401.00448 ⢠Published ⢠31
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper ⢠2306.06000 ⢠Published ⢠1 -
Fast Distributed Inference Serving for Large Language Models
Paper ⢠2305.05920 ⢠Published ⢠1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper ⢠2305.13144 ⢠Published ⢠1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper ⢠2303.06182 ⢠Published ⢠1
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper ⢠2211.04325 ⢠Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ⢠1810.04805 ⢠Published ⢠19 -
On the Opportunities and Risks of Foundation Models
Paper ⢠2108.07258 ⢠Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper ⢠2204.07705 ⢠Published ⢠1
-
MambaByte: Token-free Selective State Space Model
Paper ⢠2401.13660 ⢠Published ⢠61 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper ⢠2401.10774 ⢠Published ⢠59 -
Self-Rewarding Language Models
Paper ⢠2401.10020 ⢠Published ⢠151 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper ⢠2401.12954 ⢠Published ⢠32
-
Orca 2: Teaching Small Language Models How to Reason
Paper ⢠2311.11045 ⢠Published ⢠76 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper ⢠2311.10775 ⢠Published ⢠10 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper ⢠2311.11077 ⢠Published ⢠28 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper ⢠2311.11501 ⢠Published ⢠37
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠73 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠8 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠159 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠50
-
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Paper ⢠2309.04663 ⢠Published ⢠6 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper ⢠2309.05463 ⢠Published ⢠87 -
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Paper ⢠2310.08541 ⢠Published ⢠18 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper ⢠2310.13671 ⢠Published ⢠19
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper ⢠2408.03314 ⢠Published ⢠64 -
Training Compute-Optimal Large Language Models
Paper ⢠2203.15556 ⢠Published ⢠10 -
Scaling Laws for Precision
Paper ⢠2411.04330 ⢠Published ⢠8 -
Transcending Scaling Laws with 0.1% Extra Compute
Paper ⢠2210.11399 ⢠Published
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper ⢠2211.04325 ⢠Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper ⢠1810.04805 ⢠Published ⢠19 -
On the Opportunities and Risks of Foundation Models
Paper ⢠2108.07258 ⢠Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper ⢠2204.07705 ⢠Published ⢠1
-
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Paper ⢠2206.10789 ⢠Published ⢠4 -
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ⢠2401.00448 ⢠Published ⢠31 -
Training Compute-Optimal Large Language Models
Paper ⢠2203.15556 ⢠Published ⢠10 -
Scaling Laws for Neural Language Models
Paper ⢠2001.08361 ⢠Published ⢠7
-
MambaByte: Token-free Selective State Space Model
Paper ⢠2401.13660 ⢠Published ⢠61 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper ⢠2401.10774 ⢠Published ⢠59 -
Self-Rewarding Language Models
Paper ⢠2401.10020 ⢠Published ⢠151 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper ⢠2401.12954 ⢠Published ⢠32
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ⢠2401.00448 ⢠Published ⢠31 -
Improving Text Embeddings with Large Language Models
Paper ⢠2401.00368 ⢠Published ⢠82 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper ⢠2401.06951 ⢠Published ⢠27 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper ⢠2403.17887 ⢠Published ⢠81
-
Orca 2: Teaching Small Language Models How to Reason
Paper ⢠2311.11045 ⢠Published ⢠76 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper ⢠2311.10775 ⢠Published ⢠10 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper ⢠2311.11077 ⢠Published ⢠28 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper ⢠2311.11501 ⢠Published ⢠37
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper ⢠2310.17680 ⢠Published ⢠73 -
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Paper ⢠2312.15685 ⢠Published ⢠16 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper ⢠2401.01055 ⢠Published ⢠56 -
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ⢠2401.00448 ⢠Published ⢠31
-
Attention Is All You Need
Paper ⢠1706.03762 ⢠Published ⢠73 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ⢠2307.08691 ⢠Published ⢠8 -
Mixtral of Experts
Paper ⢠2401.04088 ⢠Published ⢠159 -
Mistral 7B
Paper ⢠2310.06825 ⢠Published ⢠50
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper ⢠2306.06000 ⢠Published ⢠1 -
Fast Distributed Inference Serving for Large Language Models
Paper ⢠2305.05920 ⢠Published ⢠1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper ⢠2305.13144 ⢠Published ⢠1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper ⢠2303.06182 ⢠Published ⢠1
-
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Paper ⢠2309.04663 ⢠Published ⢠6 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper ⢠2309.05463 ⢠Published ⢠87 -
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Paper ⢠2310.08541 ⢠Published ⢠18 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper ⢠2310.13671 ⢠Published ⢠19