Create README.md
Browse files
    	
        README.md
    ADDED
    
    | @@ -0,0 +1,66 @@ | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            library_name: transformers
         | 
| 3 | 
            +
            license: apache-2.0
         | 
| 4 | 
            +
            base_model:
         | 
| 5 | 
            +
            - nbeerbower/Qwen3-4B-abliterated-TIES
         | 
| 6 | 
            +
            datasets:
         | 
| 7 | 
            +
            - nbeerbower/GreatFirewall-DPO
         | 
| 8 | 
            +
            - nbeerbower/Schule-DPO
         | 
| 9 | 
            +
            - nbeerbower/Purpura-DPO
         | 
| 10 | 
            +
            - nbeerbower/Arkhaios-DPO
         | 
| 11 | 
            +
            - jondurbin/truthy-dpo-v0.1
         | 
| 12 | 
            +
            - antiven0m/physical-reasoning-dpo
         | 
| 13 | 
            +
            - flammenai/Date-DPO-NoAsterisks
         | 
| 14 | 
            +
            - flammenai/Prude-Phi3-DPO
         | 
| 15 | 
            +
            - Atsunori/HelpSteer2-DPO
         | 
| 16 | 
            +
            - jondurbin/gutenberg-dpo-v0.1
         | 
| 17 | 
            +
            - nbeerbower/gutenberg2-dpo
         | 
| 18 | 
            +
            - nbeerbower/gutenberg-moderne-dpo
         | 
| 19 | 
            +
            - GeneralReasoning/GeneralThought-430K
         | 
| 20 | 
            +
            - nvidia/OpenMathReasoning
         | 
| 21 | 
            +
            - nvidia/OpenCodeReasoning
         | 
| 22 | 
            +
            tags:
         | 
| 23 | 
            +
            - orpo
         | 
| 24 | 
            +
            - uncensored
         | 
| 25 | 
            +
            - reasoning
         | 
| 26 | 
            +
            - cot
         | 
| 27 | 
            +
            ---
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            # Xiaolong-Qwen3-1.7B
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            **Xiaolong** is a small, uncensored, reasoning-focused model finetuned using [ORPO and QLoRA](https://huggingface.co/blog/mlabonne/orpo-llama-3) on top of [Qwen3-4B-abliterated-TIES](https://huggingface.co/nbeerbower/Qwen3-4B-abliterated-TIES).
         | 
| 34 | 
            +
             | 
| 35 | 
            +
            ## Finetuning Details
         | 
| 36 | 
            +
             | 
| 37 | 
            +
            - **Method:** ORPO
         | 
| 38 | 
            +
            - **Epochs:** 2
         | 
| 39 | 
            +
            - **Learning Rate:** 5e-6, cosine decay w/ 5% warmup
         | 
| 40 | 
            +
            - **Batch Size:** 2 x 16 (32 effective)
         | 
| 41 | 
            +
            - **Max Grad Norm:** 0.3
         | 
| 42 | 
            +
            - **LoRA Rank:** 64
         | 
| 43 | 
            +
            - **Hardware:** 1x NVIDIA RTX A6000
         | 
| 44 | 
            +
             | 
| 45 | 
            +
            ## Dataset Composition
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            ~9,100 samples. 3,000 used Chain of Thought reasoning.
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            * [nbeerbower/GreatFirewall-DPO](https://huggingface.co/datasets/nbeerbower/GreatFirewall-DPO)
         | 
| 50 | 
            +
            * [nbeerbower/Schule-DPO](https://huggingface.co/datasets/nbeerbower/Schule-DPO)
         | 
| 51 | 
            +
            * [nbeerbower/Purpura-DPO](https://huggingface.co/datasets/nbeerbower/Purpura-DPO)
         | 
| 52 | 
            +
            * [nbeerbower/Arkhaios-DPO](https://huggingface.co/datasets/nbeerbower/Arkhaios-DPO)
         | 
| 53 | 
            +
            * [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1)
         | 
| 54 | 
            +
            * [antiven0m/physical-reasoning-dpo](https://huggingface.co/datasets/antiven0m/physical-reasoning-dpo)
         | 
| 55 | 
            +
            * [flammenai/Date-DPO-NoAsterisks](https://huggingface.co/datasets/flammenai/Date-DPO-NoAsterisks)
         | 
| 56 | 
            +
            * [flammenai/Prude-Phi3-DPO](https://huggingface.co/datasets/flammenai/Prude-Phi3-DPO)
         | 
| 57 | 
            +
            * [Atsunori/HelpSteer2-DPO](https://huggingface.co/datasets/Atsunori/HelpSteer2-DPO) (1000 samples)
         | 
| 58 | 
            +
            * [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1)
         | 
| 59 | 
            +
            * [nbeerbower/gutenberg2-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg2-dpo)
         | 
| 60 | 
            +
            * [nbeerbower/gutenberg-moderne-dpo](https://huggingface.co/datasets/nbeerbower/gutenberg-moderne-dpo)
         | 
| 61 | 
            +
             | 
| 62 | 
            +
            ### Chain of Thought
         | 
| 63 | 
            +
             | 
| 64 | 
            +
            * [GeneralReasoning/GeneralThought-430K](https://huggingface.co/datasets/GeneralReasoning/GeneralThought-430K) (1000 samples)
         | 
| 65 | 
            +
            * [nvidia/OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) (1000 samples)
         | 
| 66 | 
            +
            * [nvidia/OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) (1000 samples)
         | 
