RayTsai
/

Kaggle_3_GRPO_Neutrality

Generated from Trainer

Model card Files Files and versions

RayTsai commited on Jun 22

Commit

f2d083a

·

verified ·

1 Parent(s): 062e131

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
 這是NYCU深度學習課程KAGGLE #3的模型，使用Qwen2.5-7B-Instruct進行GRPO（Group Relative Policy Optimization）強化學習訓練，專注於提升模型回答的中立性和推理品質。
 ## 模型資訊
@@ -39,7 +52,7 @@ model = PeftModel.from_pretrained(
 )
 # 載入tokenizer
-tokenizer = AutoTokenizer.from_pretrained(\"Qwen/Qwen2.5-7B-Instruct\")
 # 使用中立性提示
 prompt = \"\"\"請從多元視角分析以下問題：
@@ -141,6 +154,7 @@ final_answer = extract_answer_from_reasoning(reasoning)
 * Ray Tsai (110651053)
 * NYCU 深度學習課程 2025
 ## 授權
 本模型遵循Qwen2.5的原始授權條款。
@@ -150,5 +164,4 @@ final_answer = extract_answer_from_reasoning(reasoning)
 * [KAGGLE #1 - SFT模型](https://huggingface.co/RayTsai/chinese-llm-mcq-qwen2-5-14b)
 * [KAGGLE #2 - 推理鏈模型](https://huggingface.co/RayTsai/Kaggle_2)
 * [技術報告](https://github.com/RayTsai/chinese-llm-neutrality)
-* [NYCU深度學習課程](https://www.nycu.edu.tw)`
-}

+---
+language: zh
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B-Instruct
+tags:
+- generated_from_trainer
+- lora
+- peft
+library_name: peft
+---
+# Chinese LLM MCQ Model with Neutrality Optimization - KAGGLE #3
 這是NYCU深度學習課程KAGGLE #3的模型，使用Qwen2.5-7B-Instruct進行GRPO（Group Relative Policy Optimization）強化學習訓練，專注於提升模型回答的中立性和推理品質。
 ## 模型資訊
 )
 # 載入tokenizer
+tokenizer = AutoTokenizer.from_pretrained(\"RayTsai/Kaggle_3_GRPO_Neutrality\")
 # 使用中立性提示
 prompt = \"\"\"請從多元視角分析以下問題：
 * Ray Tsai (110651053)
 * NYCU 深度學習課程 2025
 ## 授權
 本模型遵循Qwen2.5的原始授權條款。
 * [KAGGLE #1 - SFT模型](https://huggingface.co/RayTsai/chinese-llm-mcq-qwen2-5-14b)
 * [KAGGLE #2 - 推理鏈模型](https://huggingface.co/RayTsai/Kaggle_2)
 * [技術報告](https://github.com/RayTsai/chinese-llm-neutrality)
+* [NYCU深度學習課程](https://www.nycu.edu.tw)