Built with Axolotl

See axolotl config

axolotl version: 0.12.0.dev0

adapter: lora
base_model: Qwen/Qwen3-0.6B-Base
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - test-tour-07-24-06_train_data.json
  ds_type: json
  format: custom
  path: /workspace/axolotl/data
  type:
    field_input: input
    field_instruction: instruct
    field_output: output
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
device_map: auto
do_eval: true
early_stopping_patience: 4
eval_batch_size: 1
eval_max_new_tokens: 128
eval_steps: 5
eval_table_size: null
evals_per_epoch: null
flash_attention: true
fp16: false
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: false
group_by_length: true
learning_rate: 1.0e-05
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 50
lora_alpha: 256
lora_dropout: 0.1
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 128
lora_target_linear: true
lr_scheduler: cosine_with_min_lr
lr_scheduler_kwargs:
  min_lr_rate: 0.01
max_grad_norm: 1.0
max_steps: 100
micro_batch_size: 1
mlflow_experiment_name: /workspace/axolotl/data/test-tour-07-24-06_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 10
optim_args:
  adam_beta1: 0.9
  adam_beta2: 0.99
  adam_epsilon: 1.0e-08
optimizer: adamw_bnb_8bit
output_dir: /workspace/axolotl/outputs/test-tour-07-24-06/test-inst-07-24-06
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 5
saves_per_epoch: null
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.0001
warmup_steps: 50
weight_decay: 0.0
xformers_attention: null

workspace/axolotl/outputs/test-tour-07-24-06/test-inst-07-24-06

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9218

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.99,adam_epsilon=1e-08
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_warmup_steps: 50
  • training_steps: 100

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 2.1013
No log 0.0002 5 2.1040
No log 0.0004 10 2.1005
No log 0.0006 15 2.1008
No log 0.0008 20 2.0962
No log 0.0010 25 2.0814
No log 0.0012 30 2.0584
No log 0.0014 35 2.0321
No log 0.0016 40 2.0064
No log 0.0018 45 1.9833
1.8411 0.0020 50 1.9622
1.8411 0.0022 55 1.9426
1.8411 0.0024 60 1.9384
1.8411 0.0025 65 1.9374
1.8411 0.0027 70 1.9304
1.8411 0.0029 75 1.9265
1.8411 0.0031 80 1.9260
1.8411 0.0033 85 1.9223
1.8411 0.0035 90 1.9228
1.8411 0.0037 95 1.9214
1.7706 0.0039 100 1.9218

Framework versions

  • PEFT 0.16.0
  • Transformers 4.53.2
  • Pytorch 2.6.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.21.2
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for trenden/test-inst-07-24-06

Adapter
(20)
this model