minpeter
/

LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02

@@ -8,7 +8,7 @@ tags:
 datasets:
 - minpeter/apigen-mt-5k-friendli
 model-index:
-- name: LoRA-Qwen3-4b-v1-iteration-01-sf-apigen-01
   results: []
 ---
@@ -18,13 +18,13 @@ should probably proofread and complete it, then remove this comment. -->
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
-axolotl version: `0.9.2`
 ```yaml
 base_model: Qwen/Qwen3-4B
-hub_model_id: minpeter/LoRA-Qwen3-4b-v1-iteration-01-sf-apigen-01
-load_in_8bit: false
-load_in_4bit: false
 strict: false
 datasets:
@@ -38,36 +38,32 @@ datasets:
     message_property_mappings:
       role: role
       content: content
-    shards: 3
 chat_template: chatml
 dataset_prepared_path: last_run_prepared
 output_dir: ./output
-adapter: lora
-lora_model_dir:
-sequence_len: 8192
-pad_to_sequence_len: true
 sample_packing: true
-val_set_size: 0.05
 eval_sample_packing: true
-evals_per_epoch: 3
-lora_r: 8
-lora_alpha: 16
-lora_dropout: 0.05
-lora_fan_in_fan_out:
 lora_target_modules:
-  - gate_proj
-  - down_proj
-  - up_proj
   - q_proj
-  - v_proj
   - k_proj
   - o_proj
 wandb_project: "axolotl"
 wandb_entity: "kasfiekfs-e"
@@ -76,45 +72,35 @@ wandb_name:
 wandb_log_model:
 gradient_accumulation_steps: 2
-micro_batch_size: 2
-num_epochs: 2
-optimizer: adamw_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
-train_on_inputs: false
-group_by_length: false
 bf16: auto
 tf32: true
-gradient_checkpointing: true
-early_stopping_patience:
 resume_from_checkpoint:
-local_rank:
 logging_steps: 1
-xformers_attention:
 flash_attention: true
-loss_watchdog_threshold: 5.0
-loss_watchdog_patience: 3
 warmup_steps: 10
 saves_per_epoch: 1
-debug:
-deepspeed:
 weight_decay: 0.0
-fsdp:
-fsdp_config:
 ```
 </details><br>
-# LoRA-Qwen3-4b-v1-iteration-01-sf-apigen-01
 This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) on the minpeter/apigen-mt-5k-friendli dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2285
 ## Model description
@@ -134,27 +120,18 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
-- train_batch_size: 2
-- eval_batch_size: 2
 - seed: 42
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 4
-- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 2.0
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 1.4432        | 0.0069 | 1    | 1.0528          |
-| 0.3253        | 0.3322 | 48   | 0.2922          |
-| 0.4198        | 0.6644 | 96   | 0.2638          |
-| 0.4426        | 0.9965 | 144  | 0.2449          |
-| 0.2287        | 1.3253 | 192  | 0.2340          |
-| 0.1526        | 1.6574 | 240  | 0.2299          |
-| 0.268         | 1.9896 | 288  | 0.2285          |
 ### Framework versions

 datasets:
 - minpeter/apigen-mt-5k-friendli
 model-index:
+- name: LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02
   results: []
 ---
 [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
 <details><summary>See axolotl config</summary>
+axolotl version: `0.10.0.dev0`
 ```yaml
 base_model: Qwen/Qwen3-4B
+hub_model_id: minpeter/LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02
+plugins:
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
 strict: false
 datasets:
     message_property_mappings:
       role: role
       content: content
 chat_template: chatml
 dataset_prepared_path: last_run_prepared
 output_dir: ./output
+val_set_size: 0.0
+sequence_len: 20000
 sample_packing: true
 eval_sample_packing: true
+pad_to_sequence_len: true
+load_in_4bit: true
+adapter: qlora
+lora_r: 16
+lora_alpha: 32
 lora_target_modules:
   - q_proj
   - k_proj
+  - v_proj
   - o_proj
+  - down_proj
+  - up_proj
+lora_mlp_kernel: true
+lora_qkv_kernel: true
+lora_o_kernel: true
 wandb_project: "axolotl"
 wandb_entity: "kasfiekfs-e"
 wandb_log_model:
 gradient_accumulation_steps: 2
+micro_batch_size: 1
+num_epochs: 1
+optimizer: adamw_torch_4bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 bf16: auto
 tf32: true
+gradient_checkpointing: offload
+gradient_checkpointing_kwargs:
+  use_reentrant: false
 resume_from_checkpoint:
 logging_steps: 1
 flash_attention: true
 warmup_steps: 10
+evals_per_epoch: 4
 saves_per_epoch: 1
 weight_decay: 0.0
+special_tokens:
 ```
 </details><br>
+# LoRA-Qwen3-4b-v1-iteration-02-sf-apigen-02
 This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) on the minpeter/apigen-mt-5k-friendli dataset.
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
+- train_batch_size: 1
+- eval_batch_size: 1
 - seed: 42
 - gradient_accumulation_steps: 2
+- total_train_batch_size: 2
+- optimizer: Use OptimizerNames.ADAMW_TORCH_4BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 1.0
 ### Training results
 ### Framework versions