dongwookkwon commited on
Commit
f73b33b
·
verified ·
1 Parent(s): e64a4d3

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +24 -16
  2. config.json +16 -1
  3. model.safetensors +2 -2
README.md CHANGED
@@ -15,22 +15,22 @@ license: apache-2.0
15
  # qwen0.5b-tech-interview-test
16
 
17
  This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on mathematical reasoning tasks.
18
- It has been trained using [TRL](https://github.com/huggingface/trl) with LoRA.
19
 
20
  ## Model Details
21
 
22
  - **Base Model**: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)
23
- - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) followed by weight merging
24
  - **Task**: Mathematical reasoning (GSM8K benchmark)
25
  - **Training Framework**: TRL (Transformer Reinforcement Learning)
26
 
27
  ## Training Data
28
 
29
  The model was fine-tuned on a mixture of datasets:
30
- - **GSM8K** (40%): 400 samples from the GSM8K training set
31
- - **NuminaMath-CoT** (60%): 600 samples from the NuminaMath-CoT dataset
32
 
33
- Total training samples: 1,000
34
 
35
  ## Evaluation Results
36
 
@@ -38,39 +38,45 @@ Total training samples: 1,000
38
 
39
  | Metric | Method | Few-shot | Score | Std Error |
40
  |--------|--------|----------|-------|-----------|
41
- | exact_match | flexible-extract | 5 | **35.33%** | ±0.0132 |
42
- | exact_match | strict-match | 5 | **34.42%** | ±0.0131 |
43
 
44
- - **Baseline** (Qwen2.5-0.5B-Instruct): 34.42% (strict-match)
45
- - **Improvement**: Comparable performance achieved through fine-tuning
46
 
47
  ### Evaluation Details
48
  - Evaluation tool: [EleutherAI's lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
49
  - Model inference: vLLM
50
  - Test samples: 1,319
51
  - Generation settings: `temperature=0.0`, `do_sample=False`
 
52
 
53
  ## Training Procedure
54
 
55
  ### Training Hyperparameters
56
 
57
  - **Learning Rate**: 2e-5
58
- - **Training Steps**: 1,172 steps
59
- - **Training Epochs**: ~1 epoch
60
  - **Batch Size**: 1 (per device)
61
  - **Gradient Accumulation Steps**: 8
62
- - **LoRA Rank (r)**: 8
63
- - **LoRA Alpha**: 16
64
- - **Target Modules**: q_proj, k_proj, v_proj, o_proj
65
- - **Max Sequence Length**: 32768
 
 
66
 
67
  ### Training Process
68
 
69
  The model was trained using:
 
70
  - **Evaluation Strategy**: Steps (every 250 steps)
71
  - **Early Stopping**: Enabled with patience=3
72
  - **Best Model Selection**: Based on eval_loss
73
- - **Final Model**: Checkpoint at step 1,172
 
 
74
 
75
  ## Model Usage
76
 
@@ -144,6 +150,8 @@ outputs = model.generate([prompt], sampling_params)
144
  - Transformers: 4.57.1
145
  - PyTorch: 2.8.0
146
  - Datasets: 4.3.0
 
 
147
 
148
  ## Citation
149
 
 
15
  # qwen0.5b-tech-interview-test
16
 
17
  This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on mathematical reasoning tasks.
18
+ It has been trained using [TRL](https://github.com/huggingface/trl) with QLoRA (Quantized LoRA).
19
 
20
  ## Model Details
21
 
22
  - **Base Model**: [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)
23
+ - **Fine-tuning Method**: QLoRA (Quantized LoRA) followed by weight merging
24
  - **Task**: Mathematical reasoning (GSM8K benchmark)
25
  - **Training Framework**: TRL (Transformer Reinforcement Learning)
26
 
27
  ## Training Data
28
 
29
  The model was fine-tuned on a mixture of datasets:
30
+ - **GSM8K** (40%): 2,000 samples from the GSM8K training set
31
+ - **NuminaMath-CoT** (60%): 3,000 samples from the NuminaMath-CoT dataset
32
 
33
+ Total training samples: 5,000
34
 
35
  ## Evaluation Results
36
 
 
38
 
39
  | Metric | Method | Few-shot | Score | Std Error |
40
  |--------|--------|----------|-------|-----------|
41
+ | exact_match | flexible-extract | 5 | **33.89%** | ±0.0130 |
42
+ | exact_match | strict-match | 5 | **33.06%** | ±0.0130 |
43
 
44
+ - **Baseline** (Qwen2.5-0.5B-Instruct): ~34.42% (strict-match)
45
+ - **Note**: Results achieved through fine-tuning on a curated dataset mixture
46
 
47
  ### Evaluation Details
48
  - Evaluation tool: [EleutherAI's lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
49
  - Model inference: vLLM
50
  - Test samples: 1,319
51
  - Generation settings: `temperature=0.0`, `do_sample=False`
52
+ - Evaluation method: Few-shot with 5 examples
53
 
54
  ## Training Procedure
55
 
56
  ### Training Hyperparameters
57
 
58
  - **Learning Rate**: 2e-5
59
+ - **Training Steps**: Variable (with early stopping)
60
+ - **Training Epochs**: Up to 2 epochs (early stopping enabled)
61
  - **Batch Size**: 1 (per device)
62
  - **Gradient Accumulation Steps**: 8
63
+ - **LoRA Rank (r)**: 32
64
+ - **LoRA Alpha**: 64
65
+ - **LoRA Dropout**: 0.1
66
+ - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
67
+ - **Max Sequence Length**: 2048
68
+ - **Quantization**: 8-bit (QLoRA)
69
 
70
  ### Training Process
71
 
72
  The model was trained using:
73
+ - **Training Framework**: TRL SFTTrainer with QLoRA
74
  - **Evaluation Strategy**: Steps (every 250 steps)
75
  - **Early Stopping**: Enabled with patience=3
76
  - **Best Model Selection**: Based on eval_loss
77
+ - **Optimizer**: paged_adamw_8bit
78
+ - **Learning Rate Schedule**: Cosine
79
+ - **Warmup Ratio**: 0.15
80
 
81
  ## Model Usage
82
 
 
150
  - Transformers: 4.57.1
151
  - PyTorch: 2.8.0
152
  - Datasets: 4.3.0
153
+ - PEFT: (for LoRA/QLoRA support)
154
+ - BitsAndBytes: (for 8-bit quantization)
155
 
156
  ## Citation
157
 
config.json CHANGED
@@ -3,7 +3,7 @@
3
  "Qwen2ForCausalLM"
4
  ],
5
  "attention_dropout": 0.0,
6
- "dtype": "float16",
7
  "eos_token_id": 151643,
8
  "hidden_act": "silu",
9
  "hidden_size": 896,
@@ -43,6 +43,21 @@
43
  "num_key_value_heads": 2,
44
  "pad_token_id": 151643,
45
  "pretraining_tp": 1,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  "rms_norm_eps": 1e-06,
47
  "rope_scaling": null,
48
  "rope_theta": 1000000.0,
 
3
  "Qwen2ForCausalLM"
4
  ],
5
  "attention_dropout": 0.0,
6
+ "dtype": "float32",
7
  "eos_token_id": 151643,
8
  "hidden_act": "silu",
9
  "hidden_size": 896,
 
43
  "num_key_value_heads": 2,
44
  "pad_token_id": 151643,
45
  "pretraining_tp": 1,
46
+ "quantization_config": {
47
+ "_load_in_4bit": false,
48
+ "_load_in_8bit": true,
49
+ "bnb_4bit_compute_dtype": "float32",
50
+ "bnb_4bit_quant_storage": "uint8",
51
+ "bnb_4bit_quant_type": "fp4",
52
+ "bnb_4bit_use_double_quant": false,
53
+ "llm_int8_enable_fp32_cpu_offload": false,
54
+ "llm_int8_has_fp16_weight": false,
55
+ "llm_int8_skip_modules": null,
56
+ "llm_int8_threshold": 6.0,
57
+ "load_in_4bit": false,
58
+ "load_in_8bit": true,
59
+ "quant_method": "bitsandbytes"
60
+ },
61
  "rms_norm_eps": 1e-06,
62
  "rope_scaling": null,
63
  "rope_theta": 1000000.0,
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:92c864a88a899006eb5623a5a5aac6073cf47bd3491273dd27e14dc947e319f3
3
- size 988097536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e76871c347d60b25939a8ec64d991016205fbcd7b0fea2c8b29236a19316c658
3
+ size 903935912