Article
Shengyi Costa Huang
vwxyzjn
AI & ML interests
None yet
Organizations
models
393
vwxyzjn/ppo_async
Updated
•
1
vwxyzjn/ppo_sync
Updated
vwxyzjn/online_dpo_sync
Updated
vwxyzjn/online_dpo_async
Updated
vwxyzjn/rm_zephyr_new
Text Classification
•
7B
•
Updated
vwxyzjn/online_dpo_vllm_thread_beta_0.03__allenai_open_instruct_dev
Updated
vwxyzjn/reward_modeling__EleutherAI_pythia-14m
Updated
vwxyzjn/online_dpo_vllm__vwxyzjn_btulu
Updated
vwxyzjn/online_dpo_vllm__allenai_llama-3-tulu-2-8b
Updated
•
3
vwxyzjn/btulu
Text Generation
•
8B
•
Updated
datasets
295
vwxyzjn/the-algorithm-python
Viewer
•
Updated
•
608
•
11
vwxyzjn/rlvr_acecoder
Viewer
•
Updated
•
87.1k
•
172
vwxyzjn/rlvr_orz_math_72k_collection_extended
Viewer
•
Updated
•
56.9k
•
5
vwxyzjn/rlvr_orz_math_13k_collection_hard
Viewer
•
Updated
•
56.9k
•
3
vwxyzjn/rlvr_orz_math_57k_collected
Viewer
•
Updated
•
56.9k
•
9
vwxyzjn/acecoder_sft_gpt4o_test_cases_then_impl1
Viewer
•
Updated
•
79.1k
•
16
vwxyzjn/acecoder_sft_gpt4o_test_cases_then_impl_no_system_message
Viewer
•
Updated
•
41.6k
•
12
•
1
vwxyzjn/acecoder_sft_gpt4o_test_cases_then_impl
Viewer
•
Updated
•
41.6k
•
13
vwxyzjn/the-algorithm-python-debug
Viewer
•
Updated
•
11
•
13
vwxyzjn/multiplication_train_1000_2x2-gsm8k-verifier
Viewer
•
Updated
•
1k
•
15