Different versions of Qwen 0.6b, where the only difference is the post training method used. The post training database should be the hh rlhf dataset.
AI & ML interests
None defined yet.
Recent Activity
View all activity
models
10

AIPlans/qwen3-0.6b-base-PPO-PM
Updated
•
1

AIPlans/qwen3-0.6b-base-hl-RM
Text Classification
•
0.6B
•
Updated
•
15

AIPlans/dpo_qwen0_6b_fft
0.6B
•
Updated
•
3

AIPlans/qwen3-0.6b-dpo-lora
Text Generation
•
0.6B
•
Updated
•
13
•
1

AIPlans/qwen3-0.6B-reward-hh-rlhf
Text Generation
•
0.6B
•
Updated
•
3

AIPlans/qwen3-8b-ipo-hh-rlhf
Text Generation
•
Updated
•
34

AIPlans/qwen3-8b-dpo-hh-rlhf
Updated

AIPlans/Qwen3-HHH-Cipher-Eng
Text Generation
•
0.6B
•
Updated
•
397

AIPlans/Qwen-HHH-Cipher-Eng
Text Generation
•
0.5B
•
Updated
•
2

AIPlans/Qwen-HHH-Sans-Eng
Text Generation
•
0.5B
•
Updated
datasets
15
AIPlans/trackio-experiments
Updated
•
6
AIPlans/ultrafeedback_binarized_chinese
Viewer
•
Updated
•
14k
•
11
AIPlans/ultrafeedback_binarized
Viewer
•
Updated
•
14k
•
4
AIPlans/FilteredPKU-SafeRLHF_chinese
Viewer
•
Updated
•
12k
•
15
AIPlans/FilteredPKU-SafeRLHF
Viewer
•
Updated
•
12k
•
5
AIPlans/SafetyBench_WithLabels_Better_chinese
Viewer
•
Updated
•
546
•
12
AIPlans/SafetyBench_WithLabels
Viewer
•
Updated
•
546
•
11
AIPlans/ToxiGen_chinese
Viewer
•
Updated
•
1k
•
9
AIPlans/ToxiGen
Viewer
•
Updated
•
1k
•
14
AIPlans/MoralBenchGenerated
Viewer
•
Updated
•
8
•
6