werty1248
/

HyperCLOVAX-1.5B-Reasoning-RFT

@@ -1,199 +1,242 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+datasets:
+- exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean
+language:
+- ko
+base_model:
+- naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
 ---
+# HyperCLOVAX-1.5B-Reasoning-RFT
+- Base Model: [naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B](https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B)
+- Base Dataset: [exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean](https://huggingface.co/datasets/exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean)
+- Reasoning Format: **별도의 추론 포맷 없음**, 최종 답변을 //boxed{} 내에 표시
+- 예시
+```
+from vllm import LLM, SamplingParams
+model_name = "werty1248/HyperCLOVAX-1.5B-Reasoning-RFT"
+sampling_params = SamplingParams(temperature=0.7, top_p = 0.95, max_tokens = 8192)
+llm = LLM(model=model_name, dtype='auto', max_model_len=16384, tensor_parallel_size = 1)
+question = """Janet의 오리는 하루에 16개의 알을 낳습니다.
+그녀는 매일 아침으로 3개를 먹고, 친구들을 위해 머핀을 구울 때 4개를 사용합니다.
+남은 계란은 매일 농산물 시장에서 신선한 오리 알 하나당 2달러에 판매합니다.
+그녀는 매일 농산물 시장에서 얼마를 버나요?"""
+conversation = {"role":"user", "content": question}
+outputs = llm.chat([conversation], sampling_params)
+print(outputs[0].outputs[0].text)
+### 출력 예시
+'''
+### 문제 정의 및 조건 정리
+- Janet의 오리는 하루에 16개의 알을 낳습니다.
+- 매일 아침으로 3개의 알을 먹습니다.
+- 머핀을 구울 때 4개의 알을 사용합니다.
+- 남은 알은 매일 농산물 시장에서 2달러에 판매합니다.
+- 매일 농산물 시장에서 벌어들인 금액을 구합니다.
+### 논리 및 수식 전개
+1. **하루에 낳는 알의 수**: 16개
+2. **아침에 먹는 알의 수**: 3개
+3. **머핀을 구울 때 사용하는 알의 수**: 4개
+먼저, 남은 알의 수를 계산합니다.
+\[ \text{남은 알의 수} = \text{하루에 낳는 알의 수} - \text{아침에 먹는 알의 수} - \text{머핀을 구울 때 사용하는 알의 수} \]
+\[ \text{남은 알의 수} = 16 - 3 - 4 = 9 \]
+4. **남은 알의 가격**: 2달러
+이제, 남은 9개의 알을 판매하여 얻는 수익을 계산합니다.
+\[ \text{농산물 시장에서 벌어들인 금액} = \text{남은 알의 수} \times \text{알 당 가격} \]
+\[ \text{농산물 시장에서 벌어들인 금액} = 9 \times 2 = 18 \]
+### 최종 답변
+\[ \boxed{18} \]
+Janet은 매일 농산물 시장에서 18달러를 벌어들입니다. 이는 남은 9개의 알을 2달러에 판매하여 얻는 수익입니다. 따라서 최종적으로 Janet이 매일 농산물 시장에서 벌어들인 금액은 18달러입니다.
+'''
+```
+### Rejection sampling Fine-Tuning (RFT) with least similar samples
+- 목표: 최대한 **다양한 풀이 방법**을 학습하게 만드는 것
+1. exp-models/Open-Reasoner-Zero-orz-math-57k-collected-Korean의 질문 셋 중, MCQA, 증명을 요구하는 문제 제외 (54,832/56,878개)
+2. HyperCLOVAX-1.5B에 CoT 프롬프트를 추가(user 입력에)한 후, **Pass@8** 에서 통과한 질문만 추출 (10,433/54,832개)
+3. HyperCLOVAX-1.5B(+CoT)로 **32회 추가 추론** 후 정답을 맞힌 답변만 추출 + [Kanana-nano-2.1b](https://huggingface.co/kakaocorp/kanana-nano-2.1b-instruct)(+CoT)로 **32회 추가 추론** 후 정답을 맞힌 답변만 추출
+  - HyperCLOVAX-1.5B CoT Prompt: ```"유저가 최종적으로 구하고자 하는 값이 무엇인지 다시 한 번 정의하고, 문제에서 제시된 조건도 깔끔하게 정리하여 재작성합니다. 그리고 나서 답을 구하기 위해 구체적으로 논리 및 수식을 전개하며 문제를 푼 후, 최종 답변을 \\boxed{} 안에 작성합니다.\n\n"```
+  - Kanana-2.1b CoT Prompt (마지막에 "제목 없이 내용만 출력합니다." 추가): ```""유저가 최종적으로 구하고자 하는 값이 무엇인지 다시 한 번 정의하고, 문제에서 제시된 조건도 깔끔하게 정리하여 재작성합니다. 그리고 나서 답을 구하기 위해 구체적으로 논리 및 수식을 전개하며 문제를 푼 후, 최종 답변을 \\boxed{} 안에 작성합니다. 제목 없이 내용만 작성합니다.\n\n"```
+#### 정답을 맞힌 답변 수 분포
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/SyYp73P_mw8Wi3rx013qb.png)
+4. 정답을 맞힌 답변이 4개 초과일 경우, [nlpai-lab/KURE-v1](https://huggingface.co/nlpai-lab/KURE-v1)로 임베딩하여 **상호간 cosine 유사도가 가장 낮은 4개 답변**만 추출
+### 최종 데이터 셋
+- 10,433개 질문에 대한 38,037개 답변 데이터 셋
+- Axolotl로 학습 (2x A40에서 2시간 = 4 GPU Hours)
+- CoT 프롬프트를 추가하긴 했으나, 평균 토큰 수는 짧음(평균 300토큰 미만)
+<details><summary>axolotl config</summary>
+```
+base_model: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
+model_type: AutoModelForCausalLM
+tokenizer_config: naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B
+tokenizer_type: AutoTokenizer
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+datasets:
+  - path: werty1248/Open-Reasoner-Zero-RFT-full
+    field_messages: conversations
+    type: chat_template
+    chat_template: chatml
+dataset_prepared_path: ./data_preparation
+output_dir: /workspace/data
+hf_use_auth_token: true
+sequence_len: 16384
+sample_packing: true
+pad_to_sequence_len: true
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_layer_norm: true
+liger_glu_activation: true
+liger_fused_linear_cross_entropy: true
+wandb_project:
+#wandb_entity:
+#wandb_watch:
+wandb_name:
+#wandb_log_model:
+gradient_accumulation_steps: 1
+micro_batch_size: 8
+num_epochs: 3
+optimizer: paged_adamw_32bit
+lr_scheduler: cosine
+learning_rate: 3.0e-5
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.1
+eval_table_size:
+deepspeed: ./deepspeed_configs/zero3_bf16.json
+```
+</details>
+# Evaluation
+- HRM8K 중 4개
+- temperature=0.7, top_p = 0.95, max_tokens = 8192
+- max_tokens 내에 //boxed{} 생성 실패 시, 출력 결과 마지막에 ```"\n\n최종 정답: \\boxed{"``` 를 삽입 후 다시 inference를 수행
+| Name | GSM8K | MATH | KSM | OMNI_MATH |
+| --- | --- | --- | --- | --- |
+| werty1248/HyperCLOVAX-1.5B-Reasoning-RFT | 56.56 | 33.80 | 4.62 | 12.26 |
+| werty1248/HyperCLOVAX-1.5B-Reasoning-RFT (CoT) | 54.74 | 33.00 | 3.85 | 12.52 |
+| naver-hyperclovax/HyperCLOVAX-SEED-Text-Instruct-1.5B (CoT) | 46.63 | 29.91 | 5.18 | 11.47 |
+# 특이사항
+### 학습 데이터 생성
+#### Pass@k 비용
+- 학습 데이터 생성 비용(1x 3090으로 최소 40시간 이상)이 학습 비용(2x A40으로 2시간)보다 훨씬 높음 (원래 목표는 Pass@16 필터링이었으나 포기)
+#### Least Similarity Sampling
+- **풀이가 틀렸는데 답만 맞춘 경우**가 생기면, 다른 올바른 풀이들과 낮은 유사도를 가져 거의 확정적으로 샘플링되는 문제 있음. 별도 필터링하지 않음.
+### 출력
+- **평균 응답 길이가 753 token**(GSM8K 기준) 정도로, reasoning 모델보다 짧은 편 (아직 Reasoning 학습을 하지 않았기 때문)
+- ```</think>``` 와 같은 **Reasoning 포맷**을 이해시키려고 시도(시스템 프롬프트에 설명, 답변 후 강제로 </think> 토큰을 삽입 후 다시 completion 생성 등등)했으나 **실패**
+- 일반 질문에 대해서도 별도의 지시 없이도 CoT처럼 생각하는 경향이 발생.
+Q: 하늘은 왜 파랄까?
+A:
+```
+**문제 정의 및 조건 정리**
+**문제**: 하늘은 왜 파랄까?
+**정의**:
+- **하늘**: 지표면에서부터 머리 위로 이어지는, 공기가 있는 공간.
+- **파랗다**: 하늘의 공기가 자외선을 산란시켜서 파란색으로 보이는 현상.
+**조건**:
+- 공기의 구성 성분 (주로 질소, 산소, 그리고 소량의 다른 기체들)
+- 대기의 두께와 구조\n- 태양 광선의 성질 (자외선을 포함하고 있음)
+**문제 해결 과정**
+1. **공기의 구성 성분과 그 역할**:
+   - **질소 (N₂)**: 대기의 약 78%를 차지하며, 자외선을 산란시키는 데 큰 역할을 하지 않음.
+   - **산소 (O₂)**: 대기의 약 21%를 차지하며, 자외선을 산란시키는 데 중요한 역할을 함.
+   - **소량의 다른 기체들**: 아르곤, 이산화탄소 등, 자외선 산란에 기여하는 양은 매우 적음.
+2. **대기의 두께와 구조**:
+   - 대기는 여러 층으로 구성되어 있으며, 각 층마다 다른 특성을 가짐.
+   - **대류권**: 지표면에서 시작되어 약 10~15km 높이까지 이어지며, 기상 현상에 중요한 역할을 함.
+   - **성층권**: 대류권 위 약 50km 높이까지 이어지며, 오존층이 위치하여 자외선을 흡수함.
+   - **중간권**: 성층권 위 약 80km 높이까지 이어지며, 기온이 높지만 자외선 흡수 효과는 적음.
+   - **열권**: 중간권 위 약 500~1000km 높이까지 이어지며, 기온이 매우 높지만 자외선 흡수 효과는 매우 적음.
+3. **태양 광선의 성질**:
+   - 태양 광선은 자외선, 가시광선, 적외선으로 구성됨.
+   - 자외선은 파장이 짧아 에너지가 크며, 물체를 가열하는 데 주로 영향을 미침.
+   - 가시광선은 파장이 길어 색을 띄며, 자외선을 산란시키는 데 중요한 역할을 함.
+4. **자외선 산란**:
+   - 자외선은 대기 중의 작은 입자들에 의해 산란됨.
+   - 이 산란은 특정 각도에서 빛을 비출 때 하늘이 파랗게 보이게 함.
+   - 산란은 대기의 두께와 구성 성분, 그리고 태양 광선의 각도에 따라 달라짐.
+5. **결론**:
+   - 하늘이 파란 이유는 태양 광선의 자외선이 대기의 산란에 의해 파란색으로 관측되기 때문임.
+   - 자외선 산란은 대기의 두께, 구성 성분, 그리고 태양 광선의 각도에 따라 달라짐.
+**최종 답변**:
+하늘이 파란 이유는 태양 광선의 자외선이 대기의 산란에 의해 파란색으로 관측되기 때문입니다. 대기의 구성 성분과 두께, 그리고 태양 광선의 각도가 자외선 산란에 중요한 역할을 합니다. 이로 인해 우리는 하늘을 파란색으로 보게 됩니다.
+```