wza commited on
Commit
6e99d89
·
1 Parent(s): 80254d4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Description
2
+
3
+ Trained on top of llava-13b-v1: https://huggingface.co/wza/llava-13b-v1 (github: https://github.com/haotian-liu/LLaVA)
4
+
5
+ # Dataset
6
+
7
+ Constructed dataset on stock k lines, both pre-train and instrcution-tune
8
+
9
+ # Training scripts
10
+
11
+ pre-train:
12
+ ```
13
+ torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
14
+ LLaVA/llava/train/train_mem.py \
15
+ --model_name_or_path llava-13b-v1 \
16
+ --data_path JsonFormatDataset/PretrainData/data.json \
17
+ --image_folder JsonFormatDataset/PretrainData/images \
18
+ --vision_tower openai/clip-vit-large-patch14 \
19
+ --tune_mm_mlp_adapter True \
20
+ --mm_vision_select_layer -2 \
21
+ --mm_use_im_start_end \
22
+ --bf16 True \
23
+ --output_dir ./checkpoints/llava-13b-pretrain \
24
+ --num_train_epochs 1 \
25
+ --per_device_train_batch_size 8 \
26
+ --per_device_eval_batch_size 4 \
27
+ --gradient_accumulation_steps 2 \
28
+ --evaluation_strategy "no" \
29
+ --save_strategy "steps" \
30
+ --save_steps 2400 \
31
+ --save_total_limit 1 \
32
+ --learning_rate 2e-3 \
33
+ --weight_decay 0. \
34
+ --warmup_ratio 0.03 \
35
+ --lr_scheduler_type "cosine" \
36
+ --logging_steps 1 \
37
+ --tf32 True \
38
+ --model_max_length 2048 \
39
+ --gradient_checkpointing True \
40
+ --lazy_preprocess True \
41
+ --report_to wandb
42
+ ```
43
+
44
+ instruction:
45
+ ```
46
+ torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
47
+ LLaVA/llava/train/train_mem.py \
48
+ --model_name_or_path ./checkpoints/llava-13b-pretrain \
49
+ --data_path JsonFormatDataset/InstructionTuneData/data.json \
50
+ --image_folder JsonFormatDataset/InstructionTuneData/images/ \
51
+ --vision_tower openai/clip-vit-large-patch14 \
52
+ --mm_vision_select_layer -2 \
53
+ --mm_use_im_start_end \
54
+ --bf16 True \
55
+ --output_dir ./checkpoints/llava-13b-instruction \
56
+ --num_train_epochs 3 \
57
+ --per_device_train_batch_size 4 \
58
+ --per_device_eval_batch_size 4 \
59
+ --gradient_accumulation_steps 1 \
60
+ --evaluation_strategy "no" \
61
+ --save_strategy "steps" \
62
+ --save_steps 5000 \
63
+ --save_total_limit 3 \
64
+ --learning_rate 2e-5 \
65
+ --weight_decay 0. \
66
+ --warmup_ratio 0.03 \
67
+ --lr_scheduler_type "cosine" \
68
+ --logging_steps 1 \
69
+ --tf32 True \
70
+ --fsdp "full_shard auto_wrap" \
71
+ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
72
+ --model_max_length 2048 \
73
+ --gradient_checkpointing True \
74
+ --lazy_preprocess True \
75
+ --report_to wandb
76
+
77
+ ```
78
+
79
+ # Training settings
80
+
81
+ 8xA100-80G-sxm4
82
+
83
+ Pre-train: https://wandb.ai/wzaa/huggingface/runs/cd5ou876/overview?workspace=user-wangziao1993
84
+
85
+ Fine-tune: https://wandb.ai/wzaa/huggingface/runs/y5bsz8dw/overview?workspace=user-wangziao1993