大模型idea - a anbinx Collection

anbinx 's Collections

大模型idea

updated 14 days ago

Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published Sep 21, 2024 • 31
Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 52
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published Oct 21, 2024 • 61
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Paper • 2410.18558 • Published Oct 24, 2024 • 19
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 414
Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5 • 59
Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 199
Chain of Draft: Thinking Faster by Writing Less

Paper • 2502.18600 • Published Feb 25 • 50
URECA: Unique Region Caption Anything

Paper • 2504.05305 • Published Apr 7 • 36
An Empirical Study of Qwen3 Quantization

Paper • 2505.02214 • Published May 4 • 25
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 96
WorldPM: Scaling Human Preference Modeling

Paper • 2505.10527 • Published May 15 • 34
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published about 1 month ago • 72
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

Paper • 2507.07101 • Published 22 days ago • 3
Scaling Laws for Optimal Data Mixtures

Paper • 2507.09404 • Published 19 days ago • 33