cpatonn commited on
Commit
c7ef248
·
verified ·
1 Parent(s): 16519d6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +188 -0
README.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - multilingual
4
+ license: apache-2.0
5
+ license_name: kwaipilot-license
6
+ license_link: LICENSE
7
+ library_name: transformers
8
+ base_model:
9
+ - Kwaipilot/KAT-V1-40B
10
+ ---
11
+ <div align="center">
12
+ <img src="https://raw.githubusercontent.com/Anditty/OASIS/refs/heads/main/Group.svg" width="60%" alt="Kwaipilot" />
13
+ </div>
14
+
15
+ <hr>
16
+
17
+ <div align="center" style="line-height: 1;">
18
+ <a href="https://huggingface.co/Kwaipilot/KAT-V1-40B" target="_blank">
19
+ <img alt="Hugging Face" src="https://img.shields.io/badge/HuggingFace-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor"/>
20
+ </a>
21
+
22
+ <a href="https://arxiv.org/pdf/2507.08297" target="_blank">
23
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.08297-b31b1b.svg?style=for-the-badge"/>
24
+ </a>
25
+ </div>
26
+
27
+ # News
28
+
29
+ - Kwaipilot-AutoThink ranks first among all open-source models on [LiveCodeBench Pro](https://livecodebenchpro.com/), a challenging benchmark explicitly designed to prevent data leakage, and even surpasses strong proprietary systems such as Seed and o3-mini.
30
+
31
+ ***
32
+
33
+ # Introduction
34
+
35
+ **KAT (Kwaipilot-AutoThink)** is an open-source large-language model that mitigates *over-thinking* by learning **when** to produce explicit chain-of-thought and **when** to answer directly.
36
+
37
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61ee40a269351366e29972ad/zdnsvBmv6hWIC2Qxxy1fD.png)
38
+
39
+ Its development follows a concise two-stage training pipeline:
40
+
41
+ <table>
42
+ <thead>
43
+ <tr>
44
+ <th style="text-align:left; width:18%;">Stage</th>
45
+ <th style="text-align:left;">Core Idea</th>
46
+ <th style="text-align:left;">Key Techniques</th>
47
+ <th style="text-align:left;">Outcome</th>
48
+ </tr>
49
+ </thead>
50
+ <tbody>
51
+ <tr>
52
+ <td><strong>1. Pre-training</strong></td>
53
+ <td>Inject knowledge while separating “reasoning” from “direct answering”.</td>
54
+ <td>
55
+ <em>Dual-regime data</em><br>
56
+ • <strong>Think-off</strong> queries labeled via a custom tagging system.<br>
57
+ • <strong>Think-on</strong> queries generated by a multi-agent solver.<br><br>
58
+ <em>Knowledge Distillation&nbsp;+&nbsp;Multi-Token Prediction</em> for fine-grained utility.
59
+ </td>
60
+ <td>Base model attains strong factual and reasoning skills without full-scale pre-training costs.</td>
61
+ </tr>
62
+ <tr>
63
+ <td><strong>2. Post-training</strong></td>
64
+ <td>Make reasoning optional and efficient.</td>
65
+ <td>
66
+ <em>Cold-start AutoThink</em> — majority vote sets the initial thinking mode.<br>
67
+ <em>Step-SRPO</em> — intermediate supervision rewards correct <strong>mode selection</strong> and <strong>answer accuracy</strong> under that mode.
68
+ </td>
69
+ <td>Model triggers CoT only when beneficial, reducing token use and speeding inference.</td>
70
+ </tr>
71
+ </tbody>
72
+ </table>
73
+
74
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61ee40a269351366e29972ad/cwFAEh7Rl3f4FU46z8gBZ.png)
75
+
76
+
77
+ ***
78
+
79
+ # Data Format
80
+
81
+
82
+ KAT produces responses in a **structured template** that makes the reasoning path explicit and machine-parsable.
83
+ Two modes are supported:
84
+
85
+
86
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61ee40a269351366e29972ad/H8iAvQMMT02nyvlYnI5q1.jpeg)
87
+
88
+
89
+ ## Special Tokens
90
+
91
+ | Token | Description |
92
+ |-------|-------------|
93
+ | `<judge>` | Analyzes the input to decide whether explicit reasoning is needed. |
94
+ | `<think_on>` / `<think_off>` | Indicates whether reasoning is **activated** (“on”) or **skipped** (“off”). |
95
+ | `<think>` | Marks the start of the chain-of-thought segment when `think_on` is chosen. |
96
+ | `<answer>` | Marks the start of the final user-facing answer. |
97
+
98
+
99
+ ***
100
+
101
+ # 🔧 Quick Start
102
+
103
+ ```python
104
+ from transformers import AutoTokenizer, AutoModelForCausalLM
105
+
106
+ model_name = "Kwaipilot/KAT-V1-40B"
107
+
108
+ # load the tokenizer and the model
109
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
110
+ model = AutoModelForCausalLM.from_pretrained(
111
+ model_name,
112
+ torch_dtype="auto",
113
+ device_map="auto"
114
+ )
115
+
116
+ # prepare the model input
117
+ prompt = "Give me a short introduction to large language model."
118
+ messages = [
119
+ {"role": "user", "content": prompt}
120
+ ]
121
+ text = tokenizer.apply_chat_template(
122
+ messages,
123
+ tokenize=False,
124
+ add_generation_prompt=True
125
+ )
126
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
127
+
128
+ # conduct text completion
129
+ generated_ids = model.generate(
130
+ **model_inputs,
131
+ max_new_tokens=65536,
132
+ temperature=0.6,
133
+ top_p=0.95,
134
+ )
135
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
136
+ content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
137
+ print("prompt:\n", prompt)
138
+ print("content:\n", content)
139
+ """
140
+ prompt:
141
+ Give me a short introduction to large language model.
142
+ content:
143
+ <judge>
144
+ The user's request is to provide a concise factual introduction to large language models, which involves retrieving and summarizing basic information. This task is straightforward as it only requires recalling and presenting well-known details without deeper analysis. No complex reasoning is needed here—just a simple explanation will suffice.
145
+ </judge>
146
+
147
+ <think_off>
148
+ <answer>
149
+ A **Large Language Model (LLM)** is an advanced AI system trained on vast amounts of text data to understand, generate, and process human-like language. Here’s a concise introduction:
150
+
151
+ ### Key Points:
152
+ 1. **Training**: Trained on diverse text sources (books, websites, etc.) using deep learning.
153
+ 2. **Capabilities**:
154
+ - Answer questions, generate text, summarize content, translate languages.
155
+ - Understand context, sentiment, and nuances in language.
156
+ 3. **Architecture**: Often based on **transformer models** (e.g., BERT, GPT, LLaMA).
157
+ 4. **Scale**: Billions of parameters, requiring massive computational resources.
158
+ 5. **Applications**: Chatbots, content creation, coding assistance, research, and more.
159
+
160
+ ### Examples:
161
+ - **OpenAI’s GPT-4**: Powers ChatGPT.
162
+ - **Google’s Gemini**: Used in Bard.
163
+ - **Meta’s LLaMA**: Open-source alternative.
164
+
165
+ ### Challenges:
166
+ - **Bias**: Can reflect biases in training data.
167
+ - **Accuracy**: May hallucinate "facts" not grounded in reality.
168
+ - **Ethics**: Raises concerns about misinformation and job displacement.
169
+
170
+ LLMs represent a leap forward in natural language processing, enabling machines to interact with humans in increasingly sophisticated ways. 🌐🤖
171
+ </answer>
172
+ """
173
+ ```
174
+
175
+ ***
176
+
177
+ # Future Releases
178
+
179
+ Looking ahead, we will publish a companion paper that fully documents the **AutoThink training framework**, covering:
180
+
181
+ * Cold-start initialization procedures
182
+ * Reinforcement-learning (Step-SRPO) strategies
183
+ * Data curation and reward design details
184
+
185
+ At the same time, we will open-source:
186
+
187
+ * **Training resources** – the curated dual-regime datasets and RL codebase
188
+ * **Model suite** – checkpoints at 1.5B, 7B, and 13B parameters, all trained with AutoThink gating