masani/SFT_gsm8k_train_size_256_Llama-3.2-1B_epoch_4_global_step_4 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_1024_Llama-3.2-1B_epoch_2_global_step_8 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_512_Llama-3.2-1B_epoch_3_global_step_6 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_256_Llama-3.2-1B_epoch_5_global_step_5 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_4096_Llama-3.2-1B_epoch_1_global_step_16 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_1024_Llama-3.2-1B_epoch_1_global_step_4 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_2048_Llama-3.2-1B_epoch_1_global_step_8 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_512_Llama-3.2-1B_epoch_1_global_step_2 Text Generation • 1B • Updated May 13
masani/SFT_gsm8k_train_size_256_Llama-3.2-1B_epoch_1_global_step_1 Text Generation • 1B • Updated May 13
masani/SFT_cumulative_parity_length_16_bitwidth_1_1024_512_Llama-3.2-1B_epoch_3_global_step_12 Text Generation • 1B • Updated May 10 • 2
masani/SFT_cumulative_parity_length_32_bitwidth_1_1024_512_Qwen2-1.5B_epoch_100_global_step_400 Text Generation • 2B • Updated May 2
masani/SFT_cumulative_parity_length_32_bitwidth_1_4096_512_Qwen2-1.5B_epoch_100_global_step_1600 Text Generation • 2B • Updated May 2
masani/SFT_cumulative_parity_length_32_bitwidth_1_2048_512_Qwen2-1.5B_epoch_100_global_step_800 Text Generation • 2B • Updated May 2