nbeerbower commited on
Commit
1f900ee
·
verified ·
1 Parent(s): bf6eea6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model:
5
+ - nbeerbower/Qwen3-14B-abliterated-TIES
6
+ datasets:
7
+ - nbeerbower/GreatFirewall-DPO
8
+ - nbeerbower/Schule-DPO
9
+ - nbeerbower/Purpura-DPO
10
+ - nbeerbower/Arkhaios-DPO
11
+ - jondurbin/truthy-dpo-v0.1
12
+ - antiven0m/physical-reasoning-dpo
13
+ - flammenai/Date-DPO-NoAsterisks
14
+ - flammenai/Prude-Phi3-DPO
15
+ - Atsunori/HelpSteer2-DPO
16
+ - jondurbin/gutenberg-dpo-v0.1
17
+ - nbeerbower/gutenberg2-dpo
18
+ - nbeerbower/gutenberg-moderne-dpo
19
+ - GeneralReasoning/GeneralThought-430K
20
+ - nvidia/OpenMathReasoning
21
+ - nvidia/OpenCodeReasoning
22
+ tags:
23
+ - orpo
24
+ - uncensored
25
+ - reasoning
26
+ - cot
27
+ ---
28
+
29
+ ![image/png](https://huggingface.co/nbeerbower/Xiaolong-Qwen3-0.6B/resolve/main/cover.png?download=true)
30
+
31
+ # Xiaolong-Qwen3-14B
32
+
33
+ **Xiaolong** is a small, uncensored, reasoning-focused model finetuned using [ORPO and QLoRA](https://huggingface.co/blog/mlabonne/orpo-llama-3) on top of [Qwen3-14B-abliterated-TIES](https://huggingface.co/nbeerbower/Qwen3-14B-abliterated-TIES).
34
+
35
+ ## Finetuning Details
36
+
37
+ - **Method:** ORPO
38
+ - **Epochs:** 2
39
+ - **Learning Rate:** 5e-6, cosine decay w/ 5% warmup
40
+ - **Batch Size:** 1 x 32 (32 effective)
41
+ - **Max Grad Norm:** 0.3
42
+ - **LoRA Rank:** 64
43
+ - **Hardware:** 1x NVIDIA RTX A6000
44
+
45
+ ## Dataset Composition
46
+
47
+ ~9,100 samples. 3,000 used Chain of Thought reasoning.
48
+
49
+ * [nbeerbower/GreatFirewall-DPO](https://huggingface.co/datasets/nbeerbower/GreatFirewall-DPO)
50
+ * [nbeerbower/Schule-DPO](https://huggingface.co/datasets/nbeerbower/Schule-DPO)
51
+ * [nbeerbower/Purpura-DPO](https://huggingface.co/datasets/nbeerbower/Purpura-DPO)
52
+ * [nbeerbower/Arkhaios-DPO](https://huggingface.co/datasets/nbeerbower/Arkhaios-DPO)
53
+ * [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1)
54
+ * [antiven0m/physical-reasoning-dpo](https://huggingface.co/datasets/antiven0m/physical-reasoning-dpo)
55
+ * [flammenai/Date-DPO-NoAsterisks](https://huggingface.co/datasets/flammenai/Date-DPO-NoAsterisks)
56
+ * [flammenai/Prude-Phi3-DPO](https://huggingface.co/datasets/flammenai/Prude-Phi3-DPO)
57
+ * [Atsunori/HelpSteer2-DPO](https://huggingface.co/datasets/Atsunori/HelpSteer2-DPO) (1000 samples)
58
+ * [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1)
59
+ * [nbeerbower/gutenberg2-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg2-dpo)
60
+ * [nbeerbower/gutenberg-moderne-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg-moderne-dpo)
61
+
62
+ ### Chain of Thought
63
+
64
+ * [GeneralReasoning/GeneralThought-430K](https://huggingface.co/datasets/GeneralReasoning/GeneralThought-430K) (1000 samples)
65
+ * [nvidia/OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (1000 samples)
66
+ * [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) (1000 samples)