werty1248 commited on
Commit
adf1d82
ยท
verified ยท
1 Parent(s): c43f79a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +171 -128
README.md CHANGED
@@ -1,199 +1,242 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
 
9
 
 
10
 
 
11
 
12
- ## Model Details
 
 
13
 
14
- ### Model Description
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
 
 
 
17
 
18
- This is the model card of a ๐Ÿค— transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
 
 
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
55
 
56
- [More Information Needed]
57
 
58
- ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
 
 
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
 
 
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
 
 
 
 
75
 
76
- ## Training Details
 
 
77
 
78
- ### Training Data
 
 
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
81
 
82
- [More Information Needed]
83
 
84
- ### Training Procedure
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
 
 
87
 
88
- #### Preprocessing [optional]
 
 
 
 
89
 
90
- [More Information Needed]
 
 
 
 
 
91
 
 
 
 
 
 
92
 
93
- #### Training Hyperparameters
 
 
 
 
 
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
96
 
97
- #### Speeds, Sizes, Times [optional]
 
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
 
 
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
 
 
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
 
112
 
113
- [More Information Needed]
 
114
 
115
- #### Factors
 
 
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
 
 
130
 
131
- #### Summary
 
 
132
 
 
 
 
 
 
133
 
 
 
 
 
 
 
134
 
135
- ## Model Examination [optional]
 
 
 
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
 
 
 
138
 
139
- [More Information Needed]
 
 
140
 
141
- ## Environmental Impact
 
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ datasets:
4
+ - exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean
5
+ language:
6
+ - ko
7
+ base_model:
8
+ - naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
9
  ---
10
 
11
+ # HyperCLOVAX-1.5B-Reasoning-RFT
12
 
13
+ - Base Model: [naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B)
14
+ - Base Dataset: [exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean](https://huggingface.co/datasets/exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean)
15
 
16
+ - Reasoning Format: **๋ณ„๋„์˜ ์ถ”๋ก  ํฌ๋งท ์—†์Œ**, ์ตœ์ข… ๋‹ต๋ณ€์„ //boxed{} ๋‚ด์— ํ‘œ์‹œ
17
 
18
+ - ์˜ˆ์‹œ
19
 
20
+ ```
21
+ from vllm import LLM, SamplingParams
22
+ model_name = "werty1248/HyperCLOVAX-1.5B-Reasoning-RFT"
23
 
24
+ sampling_params = SamplingParams(temperature=0.7, top_p = 0.95, max_tokens = 8192)
25
+ llm = LLM(model=model_name, dtype='auto', max_model_len=16384, tensor_parallel_size = 1)
26
 
27
+ question = """Janet์˜ ์˜ค๋ฆฌ๋Š” ํ•˜๋ฃจ์— 16๊ฐœ์˜ ์•Œ์„ ๋‚ณ์Šต๋‹ˆ๋‹ค.
28
+ ๊ทธ๋…€๋Š” ๋งค์ผ ์•„์นจ์œผ๋กœ 3๊ฐœ๋ฅผ ๋จน๊ณ , ์นœ๊ตฌ๋“ค์„ ์œ„ํ•ด ๋จธํ•€์„ ๊ตฌ์šธ ๋•Œ 4๊ฐœ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
29
+ ๋‚จ์€ ๊ณ„๋ž€์€ ๋งค์ผ ๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ ์‹ ์„ ํ•œ ์˜ค๋ฆฌ ์•Œ ํ•˜๋‚˜๋‹น 2๋‹ฌ๋Ÿฌ์— ํŒ๋งคํ•ฉ๋‹ˆ๋‹ค.
30
+ ๊ทธ๋…€๋Š” ๋งค์ผ ๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ ์–ผ๋งˆ๋ฅผ ๋ฒ„๋‚˜์š”?"""
31
 
32
+ conversation = {"role":"user", "content": question}
33
 
34
+ outputs = llm.chat([conversation], sampling_params)
 
 
 
 
 
 
35
 
36
+ print(outputs[0].outputs[0].text)
37
 
38
+ ### ์ถœ๋ ฅ ์˜ˆ์‹œ
39
+ '''
40
+ ### ๋ฌธ์ œ ์ •์˜ ๋ฐ ์กฐ๊ฑด ์ •๋ฆฌ
41
 
42
+ - Janet์˜ ์˜ค๋ฆฌ๋Š” ํ•˜๋ฃจ์— 16๊ฐœ์˜ ์•Œ์„ ๋‚ณ์Šต๋‹ˆ๋‹ค.
43
+ - ๋งค์ผ ์•„์นจ์œผ๋กœ 3๊ฐœ์˜ ์•Œ์„ ๋จน์Šต๋‹ˆ๋‹ค.
44
+ - ๋จธํ•€์„ ๊ตฌ์šธ ๋•Œ 4๊ฐœ์˜ ์•Œ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
45
+ - ๋‚จ์€ ์•Œ์€ ๋งค์ผ ๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ 2๋‹ฌ๋Ÿฌ์— ํŒ๋งคํ•ฉ๋‹ˆ๋‹ค.
46
+ - ๋งค์ผ ๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ ๋ฒŒ์–ด๋“ค์ธ ๊ธˆ์•ก์„ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
47
 
48
+ ### ๋…ผ๋ฆฌ ๋ฐ ์ˆ˜์‹ ์ „๊ฐœ
49
 
50
+ 1. **ํ•˜๋ฃจ์— ๋‚ณ๋Š” ์•Œ์˜ ์ˆ˜**: 16๊ฐœ
51
+ 2. **์•„์นจ์— ๋จน๋Š” ์•Œ์˜ ์ˆ˜**: 3๊ฐœ
52
+ 3. **๋จธํ•€์„ ๊ตฌ์šธ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ์•Œ์˜ ์ˆ˜**: 4๊ฐœ
53
 
54
+ ๋จผ์ €, ๋‚จ์€ ์•Œ์˜ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
55
 
56
+ \[ \text{๋‚จ์€ ์•Œ์˜ ์ˆ˜} = \text{ํ•˜๋ฃจ์— ๋‚ณ๋Š” ์•Œ์˜ ์ˆ˜} - \text{์•„์นจ์— ๋จน๋Š” ์•Œ์˜ ์ˆ˜} - \text{๋จธํ•€์„ ๊ตฌ์šธ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ์•Œ์˜ ์ˆ˜} \]
57
+ \[ \text{๋‚จ์€ ์•Œ์˜ ์ˆ˜} = 16 - 3 - 4 = 9 \]
58
 
59
+ 4. **๋‚จ์€ ์•Œ์˜ ๊ฐ€๊ฒฉ**: 2๋‹ฌ๋Ÿฌ
60
 
61
+ ์ด์ œ, ๋‚จ์€ 9๊ฐœ์˜ ์•Œ์„ ํŒ๋งคํ•˜์—ฌ ์–ป๋Š” ์ˆ˜์ต์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
62
 
63
+ \[ \text{๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ ๋ฒŒ์–ด๋“ค์ธ ๊ธˆ์•ก} = \text{๋‚จ์€ ์•Œ์˜ ์ˆ˜} \times \text{์•Œ ๋‹น ๊ฐ€๊ฒฉ} \]
64
+ \[ \text{๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ ๋ฒŒ์–ด๋“ค์ธ ๊ธˆ์•ก} = 9 \times 2 = 18 \]
65
 
66
+ ### ์ตœ์ข… ๋‹ต๋ณ€
67
 
68
+ \[ \boxed{18} \]
69
 
70
+ Janet์€ ๋งค์ผ ๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ 18๋‹ฌ๋Ÿฌ๋ฅผ ๋ฒŒ์–ด๋“ค์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋‚จ์€ 9๊ฐœ์˜ ์•Œ์„ 2๋‹ฌ๋Ÿฌ์— ํŒ๋งคํ•˜์—ฌ ์–ป๋Š” ์ˆ˜์ต์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ตœ์ข…์ ์œผ๋กœ Janet์ด ๋งค์ผ ๋†์‚ฐ๋ฌผ ์‹œ์žฅ์—์„œ ๋ฒŒ์–ด๋“ค์ธ ๊ธˆ์•ก์€ 18๋‹ฌ๋Ÿฌ์ž…๋‹ˆ๋‹ค.
71
+ '''
72
 
73
+ ```
74
 
75
+ ### Rejection sampling Fine-Tuning (RFT) with least similar samples
76
 
77
+ - ๋ชฉํ‘œ: ์ตœ๋Œ€ํ•œ **๋‹ค์–‘ํ•œ ํ’€์ด ๋ฐฉ๋ฒ•**์„ ํ•™์Šตํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ
78
+ 1. exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean์˜ ์งˆ๋ฌธ ์…‹ ์ค‘, MCQA, ์ฆ๋ช…์„ ์š”๊ตฌํ•˜๋Š” ๋ฌธ์ œ ์ œ์™ธ (54,832/56,878๊ฐœ)
79
+ 2. HyperCLOVAX-1.5B์— CoT ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ถ”๊ฐ€(user ์ž…๋ ฅ์—)ํ•œ ํ›„, **Pass@8** ์—์„œ ํ†ต๊ณผํ•œ ์งˆ๋ฌธ๋งŒ ์ถ”์ถœ (10,433/54,832๊ฐœ)
80
+ 3. HyperCLOVAX-1.5B(+CoT)๋กœ **32ํšŒ ์ถ”๊ฐ€ ์ถ”๋ก ** ํ›„ ์ •๋‹ต์„ ๋งžํžŒ ๋‹ต๋ณ€๋งŒ ์ถ”์ถœ + [Kanana-nano-2.1b](https://huggingface.co/kakaocorp/kanana-nano-2.1b-instruct)(+CoT)๋กœ **32ํšŒ ์ถ”๊ฐ€ ์ถ”๋ก ** ํ›„ ์ •๋‹ต์„ ๋งžํžŒ ๋‹ต๋ณ€๋งŒ ์ถ”์ถœ
81
+ - HyperCLOVAX-1.5B CoT Prompt: ```"์œ ์ €๊ฐ€ ์ตœ์ข…์ ์œผ๋กœ ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’์ด ๋ฌด์—‡์ธ์ง€ ๋‹ค์‹œ ํ•œ ๋ฒˆ ์ •์˜ํ•˜๊ณ , ๋ฌธ์ œ์—์„œ ์ œ์‹œ๋œ ์กฐ๊ฑด๋„ ๊น”๋”ํ•˜๊ฒŒ ์ •๋ฆฌํ•˜์—ฌ ์žฌ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‚˜์„œ ๋‹ต์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌ์ฒด์ ์œผ๋กœ ๋…ผ๋ฆฌ ๋ฐ ์ˆ˜์‹์„ ์ „๊ฐœํ•˜๋ฉฐ ๋ฌธ์ œ๋ฅผ ํ‘ผ ํ›„, ์ตœ์ข… ๋‹ต๋ณ€์„ \\boxed{} ์•ˆ์— ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.\n\n"```
82
+ - Kanana-2.1b CoT Prompt (๋งˆ์ง€๋ง‰์— "์ œ๋ชฉ ์—†์ด ๋‚ด์šฉ๋งŒ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค." ์ถ”๊ฐ€): ```""์œ ์ €๊ฐ€ ์ตœ์ข…์ ์œผ๋กœ ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ’์ด ๋ฌด์—‡์ธ์ง€ ๋‹ค์‹œ ํ•œ ๋ฒˆ ์ •์˜ํ•˜๊ณ , ๋ฌธ์ œ์—์„œ ์ œ์‹œ๋œ ์กฐ๊ฑด๋„ ๊น”๋”ํ•˜๊ฒŒ ์ •๋ฆฌํ•˜์—ฌ ์žฌ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‚˜์„œ ๋‹ต์„ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌ์ฒด์ ์œผ๋กœ ๋…ผ๋ฆฌ ๋ฐ ์ˆ˜์‹์„ ์ „๊ฐœํ•˜๋ฉฐ ๋ฌธ์ œ๋ฅผ ํ‘ผ ํ›„, ์ตœ์ข… ๋‹ต๋ณ€์„ \\boxed{} ์•ˆ์— ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ œ๋ชฉ ์—†์ด ๋‚ด์šฉ๋งŒ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.\n\n"```
83
 
84
+ #### ์ •๋‹ต์„ ๋งžํžŒ ๋‹ต๋ณ€ ์ˆ˜ ๋ถ„ํฌ
85
 
86
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/SyYp73P_mw8Wi3rx013qb.png)
87
 
88
+ 4. ์ •๋‹ต์„ ๋งžํžŒ ๋‹ต๋ณ€์ด 4๊ฐœ ์ดˆ๊ณผ์ผ ๊ฒฝ์šฐ, [nlpai-lab/KURE-v1](https://huggingface.co/nlpai-lab/KURE-v1)๋กœ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ **์ƒํ˜ธ๊ฐ„ cosine ์œ ์‚ฌ๋„๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ์€ 4๊ฐœ ๋‹ต๋ณ€**๋งŒ ์ถ”์ถœ
89
 
90
+ ### ์ตœ์ข… ๋ฐ์ดํ„ฐ ์…‹
91
 
92
+ - 10,433๊ฐœ ์งˆ๋ฌธ์— ๋Œ€ํ•œ 38,037๊ฐœ ๋‹ต๋ณ€ ๋ฐ์ดํ„ฐ ์…‹
93
+ - Axolotl๋กœ ํ•™์Šต (2x A40์—์„œ 2์‹œ๊ฐ„ = 4 GPU Hours)
94
+ - CoT ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธด ํ–ˆ์œผ๋‚˜, ํ‰๊ท  ํ† ํฐ ์ˆ˜๋Š” ์งง์Œ(ํ‰๊ท  300ํ† ํฐ ๋ฏธ๋งŒ)
95
 
96
+ <details><summary>axolotl config</summary>
97
 
98
+ ```
99
+ base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
100
+ model_type: AutoModelForCausalLM
101
+ tokenizer_config: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
102
+ tokenizer_type: AutoTokenizer
103
 
104
+ load_in_8bit: false
105
+ load_in_4bit: false
106
+ strict: false
107
 
108
+ datasets:
109
+ - path: werty1248/Open-Reasoner-Zero-RFT-full
110
+ field_messages: conversations
111
+ type: chat_template
112
+ chat_template: chatml
113
 
114
+ dataset_prepared_path: ./data_preparation
115
+ output_dir: /workspace/data
116
 
117
+ hf_use_auth_token: true
118
 
119
+ sequence_len: 16384
120
+ sample_packing: true
121
+ pad_to_sequence_len: true
122
 
123
+ plugins:
124
+ - axolotl.integrations.liger.LigerPlugin
125
+ liger_rope: true
126
+ liger_rms_norm: true
127
+ liger_layer_norm: true
128
+ liger_glu_activation: true
129
+ liger_fused_linear_cross_entropy: true
130
 
131
+ wandb_project:
132
+ #wandb_entity:
133
+ #wandb_watch:
134
+ wandb_name:
135
+ #wandb_log_model:
136
 
137
+ gradient_accumulation_steps: 1
138
+ micro_batch_size: 8
139
+ num_epochs: 3
140
+ optimizer: paged_adamw_32bit
141
+ lr_scheduler: cosine
142
+ learning_rate: 3.0e-5
143
 
144
+ train_on_inputs: false
145
+ group_by_length: false
146
+ bf16: auto
147
+ fp16:
148
+ tf32: false
149
 
150
+ gradient_checkpointing: true
151
+ early_stopping_patience:
152
+ resume_from_checkpoint:
153
+ local_rank:
154
+ logging_steps: 1
155
+ xformers_attention:
156
+ flash_attention: true
157
 
158
+ warmup_ratio: 0.1
159
+ eval_table_size:
160
 
161
+ deepspeed: ./deepspeed_configs/zero3_bf16.json
162
+ ```
163
+ </details>
164
 
 
165
 
166
+ # Evaluation
167
 
168
+ - HRM8K ์ค‘ 4๊ฐœ
169
+ - temperature=0.7, top_p = 0.95, max_tokens = 8192
170
+ - max_tokens ๋‚ด์— //boxed{} ์ƒ์„ฑ ์‹คํŒจ ์‹œ, ์ถœ๋ ฅ ๊ฒฐ๊ณผ ๋งˆ์ง€๋ง‰์— ```"\n\n์ตœ์ข… ์ •๋‹ต: \\boxed{"``` ๋ฅผ ์‚ฝ์ž… ํ›„ ๋‹ค์‹œ inference๋ฅผ ์ˆ˜ํ–‰
171
 
172
+ | Name | GSM8K | MATH | KSM | OMNI_MATH |
173
+ | --- | --- | --- | --- | --- |
174
+ | werty1248/HyperCLOVAX-1.5B-Reasoning-RFT | 56.56 | 33.80 | 4.62 | 12.26 |
175
+ | werty1248/HyperCLOVAX-1.5B-Reasoning-RFT (CoT) | 54.74 | 33.00 | 3.85 | 12.52 |
176
+ | naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B (CoT) | 46.63 | 29.91 | 5.18 | 11.47 |
177
 
178
+ # ํŠน์ด์‚ฌํ•ญ
179
 
180
+ ### ํ•™์Šต ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
181
 
182
+ #### Pass@k ๋น„์šฉ
183
+ - ํ•™์Šต ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋น„์šฉ(1x 3090์œผ๋กœ ์ตœ์†Œ 40์‹œ๊ฐ„ ์ด์ƒ)์ด ํ•™์Šต ๋น„์šฉ(2x A40์œผ๋กœ 2์‹œ๊ฐ„)๋ณด๋‹ค ํ›จ์”ฌ ๋†’์Œ (์›๋ž˜ ๋ชฉํ‘œ๋Š” Pass@16 ํ•„ํ„ฐ๋ง์ด์—ˆ์œผ๋‚˜ ํฌ๊ธฐ)
184
 
185
+ #### Least Similarity Sampling
186
+ - **ํ’€์ด๊ฐ€ ํ‹€๋ ธ๋Š”๋ฐ ๋‹ต๋งŒ ๋งž์ถ˜ ๊ฒฝ์šฐ**๊ฐ€ ์ƒ๊ธฐ๋ฉด, ๋‹ค๋ฅธ ์˜ฌ๋ฐ”๋ฅธ ํ’€์ด๋“ค๊ณผ ๋‚ฎ์€ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ€์ ธ ๊ฑฐ์˜ ํ™•์ •์ ์œผ๋กœ ์ƒ˜ํ”Œ๋ง๋˜๋Š” ๋ฌธ์ œ ์žˆ์Œ. ๋ณ„๋„ ํ•„ํ„ฐ๋งํ•˜์ง€ ์•Š์Œ.
187
 
188
+ ### ์ถœ๋ ฅ
189
+ - **ํ‰๊ท  ์‘๋‹ต ๊ธธ์ด๊ฐ€ 753 token**(GSM8K ๊ธฐ์ค€) ์ •๋„๋กœ, reasoning ๋ชจ๋ธ๋ณด๋‹ค ์งง์€ ํŽธ (์•„์ง Reasoning ํ•™์Šต์„ ํ•˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ)
190
+ - ```</think>``` ์™€ ๊ฐ™์€ **Reasoning ํฌ๋งท**์„ ์ดํ•ด์‹œํ‚ค๋ ค๊ณ  ์‹œ๋„(์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ์— ์„ค๋ช…, ๋‹ต๋ณ€ ํ›„ ๊ฐ•์ œ๋กœ </think> ํ† ํฐ์„ ์‚ฝ์ž… ํ›„ ๋‹ค์‹œ completion ์ƒ์„ฑ ๋“ฑ๋“ฑ)ํ–ˆ์œผ๋‚˜ **์‹คํŒจ**
191
 
192
+ - ์ผ๋ฐ˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ๋„ ๋ณ„๋„์˜ ์ง€์‹œ ์—†์ด๋„ CoT์ฒ˜๋Ÿผ ์ƒ๊ฐํ•˜๋Š” ๊ฒฝํ–ฅ์ด ๋ฐœ์ƒ.
193
 
194
+ Q: ํ•˜๋Š˜์€ ์™œ ํŒŒ๋ž„๊นŒ?
195
 
196
+ A:
197
 
198
+ ```
199
 
200
+ **๋ฌธ์ œ ์ •์˜ ๋ฐ ์กฐ๊ฑด ์ •๋ฆฌ**
201
 
202
+ **๋ฌธ์ œ**: ํ•˜๋Š˜์€ ์™œ ํŒŒ๋ž„๊นŒ?
203
 
204
+ **์ •์˜**:
205
+ - **ํ•˜๋Š˜**: ์ง€ํ‘œ๋ฉด์—์„œ๋ถ€ํ„ฐ ๋จธ๋ฆฌ ์œ„๋กœ ์ด์–ด์ง€๋Š”, ๊ณต๊ธฐ๊ฐ€ ์žˆ๋Š” ๊ณต๊ฐ„.
206
+ - **ํŒŒ๋ž—๋‹ค**: ํ•˜๋Š˜์˜ ๊ณต๊ธฐ๊ฐ€ ์ž์™ธ์„ ์„ ์‚ฐ๋ž€์‹œ์ผœ์„œ ํŒŒ๋ž€์ƒ‰์œผ๋กœ ๋ณด์ด๋Š” ํ˜„์ƒ.
207
 
208
+ **์กฐ๊ฑด**:
209
+ - ๊ณต๊ธฐ์˜ ๊ตฌ์„ฑ ์„ฑ๋ถ„ (์ฃผ๋กœ ์งˆ์†Œ, ์‚ฐ์†Œ, ๊ทธ๋ฆฌ๊ณ  ์†Œ๋Ÿ‰์˜ ๋‹ค๋ฅธ ๊ธฐ์ฒด๋“ค)
210
+ - ๋Œ€๊ธฐ์˜ ๋‘๊ป˜์™€ ๊ตฌ์กฐ\n- ํƒœ์–‘ ๊ด‘์„ ์˜ ์„ฑ์งˆ (์ž์™ธ์„ ์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ)
211
 
212
+ **๋ฌธ์ œ ํ•ด๊ฒฐ ๊ณผ์ •**
213
+ 1. **๊ณต๊ธฐ์˜ ๊ตฌ์„ฑ ์„ฑ๋ถ„๊ณผ ๊ทธ ์—ญํ• **:
214
+ - **์งˆ์†Œ (Nโ‚‚)**: ๋Œ€๊ธฐ์˜ ์•ฝ 78%๋ฅผ ์ฐจ์ง€ํ•˜๋ฉฐ, ์ž์™ธ์„ ์„ ์‚ฐ๋ž€์‹œํ‚ค๋Š” ๋ฐ ํฐ ์—ญํ• ์„ ํ•˜์ง€ ์•Š์Œ.
215
+ - **์‚ฐ์†Œ (Oโ‚‚)**: ๋Œ€๊ธฐ์˜ ์•ฝ 21%๋ฅผ ์ฐจ์ง€ํ•˜๋ฉฐ, ์ž์™ธ์„ ์„ ์‚ฐ๋ž€์‹œํ‚ค๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•จ.
216
+ - **์†Œ๋Ÿ‰์˜ ๋‹ค๋ฅธ ๊ธฐ์ฒด๋“ค**: ์•„๋ฅด๊ณค, ์ด์‚ฐํ™”ํƒ„์†Œ ๋“ฑ, ์ž์™ธ์„  ์‚ฐ๋ž€์— ๊ธฐ์—ฌํ•˜๋Š” ์–‘์€ ๋งค์šฐ ์ ์Œ.
217
 
218
+ 2. **๋Œ€๊ธฐ์˜ ๋‘๊ป˜์™€ ๊ตฌ์กฐ**:
219
+ - ๋Œ€๊ธฐ๋Š” ์—ฌ๋Ÿฌ ์ธต์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ฐ ์ธต๋งˆ๋‹ค ๋‹ค๋ฅธ ํŠน์„ฑ์„ ๊ฐ€์ง.
220
+ - **๋Œ€๋ฅ˜๊ถŒ**: ์ง€ํ‘œ๋ฉด์—์„œ ์‹œ์ž‘๋˜์–ด ์•ฝ 10~15km ๋†’์ด๊นŒ์ง€ ์ด์–ด์ง€๋ฉฐ, ๊ธฐ์ƒ ํ˜„์ƒ์— ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•จ.
221
+ - **์„ฑ์ธต๊ถŒ**: ๋Œ€๋ฅ˜๊ถŒ ์œ„ ์•ฝ 50km ๋†’์ด๊นŒ์ง€ ์ด์–ด์ง€๋ฉฐ, ์˜ค์กด์ธต์ด ์œ„์น˜ํ•˜์—ฌ ์ž์™ธ์„ ์„ ํก์ˆ˜ํ•จ.
222
+ - **์ค‘๊ฐ„๊ถŒ**: ์„ฑ์ธต๊ถŒ ์œ„ ์•ฝ 80km ๋†’์ด๊นŒ์ง€ ์ด์–ด์ง€๋ฉฐ, ๊ธฐ์˜จ์ด ๋†’์ง€๋งŒ ์ž์™ธ์„  ํก์ˆ˜ ํšจ๊ณผ๋Š” ์ ์Œ.
223
+ - **์—ด๊ถŒ**: ์ค‘๊ฐ„๊ถŒ ์œ„ ์•ฝ 500~1000km ๋†’์ด๊นŒ์ง€ ์ด์–ด์ง€๋ฉฐ, ๊ธฐ์˜จ์ด ๋งค์šฐ ๋†’์ง€๋งŒ ์ž์™ธ์„  ํก์ˆ˜ ํšจ๊ณผ๋Š” ๋งค์šฐ ์ ์Œ.
224
 
225
+ 3. **ํƒœ์–‘ ๊ด‘์„ ์˜ ์„ฑ์งˆ**:
226
+ - ํƒœ์–‘ ๊ด‘์„ ์€ ์ž์™ธ์„ , ๊ฐ€์‹œ๊ด‘์„ , ์ ์™ธ์„ ์œผ๋กœ ๊ตฌ์„ฑ๋จ.
227
+ - ์ž์™ธ์„ ์€ ํŒŒ์žฅ์ด ์งง์•„ ์—๋„ˆ์ง€๊ฐ€ ํฌ๋ฉฐ, ๋ฌผ์ฒด๋ฅผ ๊ฐ€์—ดํ•˜๋Š” ๋ฐ ์ฃผ๋กœ ์˜ํ–ฅ์„ ๋ฏธ์นจ.
228
+ - ๊ฐ€์‹œ๊ด‘์„ ์€ ํŒŒ์žฅ์ด ๊ธธ์–ด ์ƒ‰์„ ๋„๋ฉฐ, ์ž์™ธ์„ ์„ ์‚ฐ๋ž€์‹œํ‚ค๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•จ.
229
 
230
+ 4. **์ž์™ธ์„  ์‚ฐ๋ž€**:
231
+ - ์ž์™ธ์„ ์€ ๋Œ€๊ธฐ ์ค‘์˜ ์ž‘์€ ์ž…์ž๋“ค์— ์˜ํ•ด ์‚ฐ๋ž€๋จ.
232
+ - ์ด ์‚ฐ๋ž€์€ ํŠน์ • ๊ฐ๋„์—์„œ ๋น›์„ ๋น„์ถœ ๋•Œ ํ•˜๋Š˜์ด ํŒŒ๋ž—๊ฒŒ ๋ณด์ด๊ฒŒ ํ•จ.
233
+ - ์‚ฐ๋ž€์€ ๋Œ€๊ธฐ์˜ ๋‘๊ป˜์™€ ๊ตฌ์„ฑ ์„ฑ๋ถ„, ๊ทธ๋ฆฌ๊ณ  ํƒœ์–‘ ๊ด‘์„ ์˜ ๊ฐ๋„์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง.
234
 
235
+ 5. **๊ฒฐ๋ก **:
236
+ - ํ•˜๋Š˜์ด ํŒŒ๋ž€ ์ด์œ ๋Š” ํƒœ์–‘ ๊ด‘์„ ์˜ ์ž์™ธ์„ ์ด ๋Œ€๊ธฐ์˜ ์‚ฐ๋ž€์— ์˜ํ•ด ํŒŒ๋ž€์ƒ‰์œผ๋กœ ๊ด€์ธก๋˜๊ธฐ ๋•Œ๋ฌธ์ž„.
237
+ - ์ž์™ธ์„  ์‚ฐ๋ž€์€ ๋Œ€๊ธฐ์˜ ๋‘๊ป˜, ๊ตฌ์„ฑ ์„ฑ๋ถ„, ๊ทธ๋ฆฌ๊ณ  ํƒœ์–‘ ๊ด‘์„ ์˜ ๊ฐ๋„์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง.
238
 
239
+ **์ตœ์ข… ๋‹ต๋ณ€**:
240
+ ํ•˜๋Š˜์ด ํŒŒ๋ž€ ์ด์œ ๋Š” ํƒœ์–‘ ๊ด‘์„ ์˜ ์ž์™ธ์„ ์ด ๋Œ€๊ธฐ์˜ ์‚ฐ๋ž€์— ์˜ํ•ด ํŒŒ๋ž€์ƒ‰์œผ๋กœ ๊ด€์ธก๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๋Œ€๊ธฐ์˜ ๊ตฌ์„ฑ ์„ฑ๋ถ„๊ณผ ๋‘๊ป˜, ๊ทธ๋ฆฌ๊ณ  ํƒœ์–‘ ๊ด‘์„ ์˜ ๊ฐ๋„๊ฐ€ ์ž์™ธ์„  ์‚ฐ๋ž€์— ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์šฐ๋ฆฌ๋Š” ํ•˜๋Š˜์„ ํŒŒ๋ž€์ƒ‰์œผ๋กœ ๋ณด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
241
 
242
+ ```