daily-papers
updated
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector
Retrieval
Paper
• 2409.10516
• Published
• 43
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded
Attributions and Learning to Refuse
Paper
• 2409.11242
• Published
• 7
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like
Language Models
Paper
• 2409.11136
• Published
• 22
On the Diagram of Thought
Paper
• 2409.10038
• Published
• 13
Video Instruction Tuning With Synthetic Data
Paper
• 2410.02713
• Published
• 41
Large Language Models as Markov Chains
Paper
• 2410.02724
• Published
• 33
Contrastive Localized Language-Image Pre-Training
Paper
• 2410.02746
• Published
• 37
Training Language Models on Synthetic Edit Sequences Improves Code
Synthesis
Paper
• 2410.02749
• Published
• 13
L-CiteEval: Do Long-Context Models Truly Leverage Context for
Responding?
Paper
• 2410.02115
• Published
• 10
Interpreting and Editing Vision-Language Representations to Mitigate
Hallucinations
Paper
• 2410.02762
• Published
• 9
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language
Models
Paper
• 2410.01335
• Published
• 5
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
Paper
• 2410.01044
• Published
• 35
Not All LLM Reasoners Are Created Equal
Paper
• 2410.01748
• Published
• 29
Quantifying Generalization Complexity for Large Language Models
Paper
• 2410.01769
• Published
• 13
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Paper
• 2410.01518
• Published
• 3
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper
• 2409.19951
• Published
• 54
Paper
• 2409.19606
• Published
• 26
Instruction Following without Instruction Tuning
Paper
• 2409.14254
• Published
• 29
LongGenBench: Long-context Generation Benchmark
Paper
• 2410.04199
• Published
• 22
Erasing Conceptual Knowledge from Language Models
Paper
• 2410.02760
• Published
• 14
Paper
• 2410.05258
• Published
• 180
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
Hallucinations
Paper
• 2410.02707
• Published
• 47
Addition is All You Need for Energy-efficient Language Models
Paper
• 2410.00907
• Published
• 151
Selective Attention Improves Transformer
Paper
• 2410.02703
• Published
• 25
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Paper
• 2410.09037
• Published
• 4
Rethinking Data Selection at Scale: Random Selection is Almost All You
Need
Paper
• 2410.09335
• Published
• 16
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
• 2410.08815
• Published
• 47
SuperCorrect: Supervising and Correcting Language Models with
Error-Driven Insights
Paper
• 2410.09008
• Published
• 17
Mechanistic Permutability: Match Features Across Layers
Paper
• 2410.07656
• Published
• 20
SimpleStrat: Diversifying Language Model Generation with Stratification
Paper
• 2410.09038
• Published
• 4
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit
Positional Awareness
Paper
• 2410.07035
• Published
• 17
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Paper
• 2410.12405
• Published
• 13
Exploring Model Kinship for Merging Large Language Models
Paper
• 2410.12613
• Published
• 21
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
• 2410.10814
• Published
• 51
What Matters in Transformers? Not All Attention is Needed
Paper
• 2406.15786
• Published
• 31
Vector-ICL: In-context Learning with Continuous Vector Representations
Paper
• 2410.05629
• Published
• 4
Intriguing Properties of Large Language and Vision Models
Paper
• 2410.04751
• Published
• 16
AutoTrain: No-code training for state-of-the-art models
Paper
• 2410.15735
• Published
• 59
Pre-training Distillation for Large Language Models: A Design Space
Exploration
Paper
• 2410.16215
• Published
• 17
In-context learning and Occam's razor
Paper
• 2410.14086
• Published
• 2
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper
• 2410.13276
• Published
• 29
How Do Training Methods Influence the Utilization of Vision Models?
Paper
• 2410.14470
• Published
• 5
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese
Diaspora Media
Paper
• 2410.12791
• Published
• 5
Counting Ability of Large Language Models and Impact of Tokenization
Paper
• 2410.19730
• Published
• 11
Analysing the Residual Stream of Language Models Under Knowledge
Conflicts
Paper
• 2410.16090
• Published
• 8
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite
Learning
Paper
• 2410.19290
• Published
• 10
On Memorization of Large Language Models in Logical Reasoning
Paper
• 2410.23123
• Published
• 18
Toxicity of the Commons: Curating Open-Source Pre-Training Data
Paper
• 2410.22587
• Published
• 10
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback
Paper
• 2410.21242
• Published
• 7
Task Vectors are Cross-Modal
Paper
• 2410.22330
• Published
• 11
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper
• 2410.20088
• Published
• 4
LongReward: Improving Long-context Large Language Models with AI
Feedback
Paper
• 2410.21252
• Published
• 19
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
• 2410.23168
• Published
• 24
Constraint Back-translation Improves Complex Instruction Following of
Large Language Models
Paper
• 2410.24175
• Published
• 18
Language Models can Self-Lengthen to Generate Long Texts
Paper
• 2410.23933
• Published
• 18
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A
Gradient Perspective
Paper
• 2410.23743
• Published
• 64
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long
Document Understanding
Paper
• 2411.01106
• Published
• 4
Physics in Next-token Prediction
Paper
• 2411.00660
• Published
• 14
GPT or BERT: why not both?
Paper
• 2410.24159
• Published
• 13
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting
Rare Concepts in Foundation Models
Paper
• 2411.00743
• Published
• 7
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale
Haystacks?
Paper
• 2411.05000
• Published
• 22
Analyzing The Language of Visual Tokens
Paper
• 2411.05001
• Published
• 24
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM
Data Contamination
Paper
• 2411.03823
• Published
• 49
DELIFT: Data Efficient Language model Instruction Fine Tuning
Paper
• 2411.04425
• Published
• 11
The Semantic Hub Hypothesis: Language Models Share Semantic
Representations Across Languages and Modalities
Paper
• 2411.04986
• Published
• 5
Counterfactual Generation from Language Models
Paper
• 2411.07180
• Published
• 5
Cut Your Losses in Large-Vocabulary Language Models
Paper
• 2411.09009
• Published
• 49
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
• 2411.08147
• Published
• 65
Can sparse autoencoders be used to decompose and interpret steering
vectors?
Paper
• 2411.08790
• Published
• 8
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding
And A Retrieval-Aware Tuning Framework
Paper
• 2411.06176
• Published
• 45
Top-nσ: Not All Logits Are You Need
Paper
• 2411.07641
• Published
• 24
Drowning in Documents: Consequences of Scaling Reranker Inference
Paper
• 2411.11767
• Published
• 19
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper
• 2411.14402
• Published
• 47
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented
LMs
Paper
• 2411.14199
• Published
• 34
Do I Know This Entity? Knowledge Awareness and Hallucinations in
Language Models
Paper
• 2411.14257
• Published
• 14
Patience Is The Key to Large Language Model Reasoning
Paper
• 2411.13082
• Published
• 7
Loss-to-Loss Prediction: Scaling Laws for All Datasets
Paper
• 2411.12925
• Published
• 5
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context
Learning via MCTS
Paper
• 2411.18478
• Published
• 37
Training Noise Token Pruning
Paper
• 2411.18092
• Published
• 1
Star Attention: Efficient LLM Inference over Long Sequences
Paper
• 2411.17116
• Published
• 53
Predicting Emergent Capabilities by Finetuning
Paper
• 2411.16035
• Published
• 7
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-OASIS
Paper
• 2411.19655
• Published
• 20
Free Process Rewards without Process Labels
Paper
• 2412.01981
• Published
• 34
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
Reasoning Capability
Paper
• 2411.19943
• Published
• 62
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
Paper
• 2412.04403
• Published
• 2
Marco-LLM: Bridging Languages via Massive Multilingual Training for
Cross-Lingual Enhancement
Paper
• 2412.04003
• Published
• 10
Paper
• 2412.04315
• Published
• 19
Evaluating Language Models as Synthetic Data Generators
Paper
• 2412.03679
• Published
• 47
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale
Mitigates Performance Tradeoffs
Paper
• 2412.04144
• Published
• 6
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Paper
• 2412.06071
• Published
• 9
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Paper
• 2412.06676
• Published
• 9
Learned Compression for Compressed Learning
Paper
• 2412.09405
• Published
• 13
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
• 2412.08635
• Published
• 49
Smaller Language Models Are Better Instruction Evolvers
Paper
• 2412.11231
• Published
• 28
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
• 2412.11919
• Published
• 36
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
• 2412.11768
• Published
• 43
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for
In-Context Learning in Transformers
Paper
• 2412.12276
• Published
• 15
Are Your LLMs Capable of Stable Reasoning?
Paper
• 2412.13147
• Published
• 93
AntiLeak-Bench: Preventing Data Contamination by Automatically
Constructing Benchmarks with Updated Real-World Knowledge
Paper
• 2412.13670
• Published
• 6
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
• 2412.13663
• Published
• 160
Token-Budget-Aware LLM Reasoning
Paper
• 2412.18547
• Published
• 46
GeAR: Generation Augmented Retrieval
Paper
• 2501.02772
• Published
• 21
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published
• 59
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
• 2501.18585
• Published
• 61
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
• 2501.19324
• Published
• 39
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
Tensor Product Attention Is All You Need
Paper
• 2501.06425
• Published
• 90
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs)
More Self-Confident Even When They Are Wrong
Paper
• 2501.09775
• Published
• 32
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
The Geometry of Tokens in Internal Representations of Large Language
Models
Paper
• 2501.10573
• Published
• 9
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
Debate Helps Weak-to-Strong Generalization
Paper
• 2501.13124
• Published
• 7
LongRoPE2: Near-Lossless LLM Context Window Scaling
Paper
• 2502.20082
• Published
• 36
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the
Limits of Embedding Space Capacity
Paper
• 2502.13063
• Published
• 74