Text Generation
Transformers
Safetensors
English
qwen2
axolotl
dpo
trl
conversational
Eval Results
text-generation-inference
Files changed (1) hide show
  1. README.md +317 -311
README.md CHANGED
@@ -1,312 +1,318 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - axolotl
5
- - dpo
6
- - trl
7
- base_model: Qwen/Qwen2.5-7B-Instruct
8
- pipeline_tag: text-generation
9
- library_name: transformers
10
- model-index:
11
- - name: Humanish-Qwen2.5-7B-Instruct
12
- results:
13
- - task:
14
- type: text-generation
15
- name: Text Generation
16
- dataset:
17
- name: IFEval (0-Shot)
18
- type: HuggingFaceH4/ifeval
19
- args:
20
- num_few_shot: 0
21
- metrics:
22
- - type: inst_level_strict_acc and prompt_level_strict_acc
23
- value: 72.84
24
- name: strict accuracy
25
- source:
26
- url: >-
27
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
28
- name: Open LLM Leaderboard
29
- - task:
30
- type: text-generation
31
- name: Text Generation
32
- dataset:
33
- name: BBH (3-Shot)
34
- type: BBH
35
- args:
36
- num_few_shot: 3
37
- metrics:
38
- - type: acc_norm
39
- value: 34.48
40
- name: normalized accuracy
41
- source:
42
- url: >-
43
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
44
- name: Open LLM Leaderboard
45
- - task:
46
- type: text-generation
47
- name: Text Generation
48
- dataset:
49
- name: MATH Lvl 5 (4-Shot)
50
- type: hendrycks/competition_math
51
- args:
52
- num_few_shot: 4
53
- metrics:
54
- - type: exact_match
55
- value: 0
56
- name: exact match
57
- source:
58
- url: >-
59
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
60
- name: Open LLM Leaderboard
61
- - task:
62
- type: text-generation
63
- name: Text Generation
64
- dataset:
65
- name: GPQA (0-shot)
66
- type: Idavidrein/gpqa
67
- args:
68
- num_few_shot: 0
69
- metrics:
70
- - type: acc_norm
71
- value: 6.49
72
- name: acc_norm
73
- source:
74
- url: >-
75
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
76
- name: Open LLM Leaderboard
77
- - task:
78
- type: text-generation
79
- name: Text Generation
80
- dataset:
81
- name: MuSR (0-shot)
82
- type: TAUR-Lab/MuSR
83
- args:
84
- num_few_shot: 0
85
- metrics:
86
- - type: acc_norm
87
- value: 8.42
88
- name: acc_norm
89
- source:
90
- url: >-
91
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
92
- name: Open LLM Leaderboard
93
- - task:
94
- type: text-generation
95
- name: Text Generation
96
- dataset:
97
- name: MMLU-PRO (5-shot)
98
- type: TIGER-Lab/MMLU-Pro
99
- config: main
100
- split: test
101
- args:
102
- num_few_shot: 5
103
- metrics:
104
- - type: acc
105
- value: 37.76
106
- name: accuracy
107
- source:
108
- url: >-
109
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
110
- name: Open LLM Leaderboard
111
- datasets:
112
- - HumanLLMs/Human-Like-DPO-Dataset
113
- language:
114
- - en
115
- ---
116
- <div align="center">
117
- <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" />
118
- <h1>Enhancing Human-Like Responses in Large Language Models</h1>
119
- </div>
120
-
121
- <p align="center">
122
- &nbsp&nbsp | 🤗 <a href="https://huggingface.co/collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>&nbsp&nbsp |
123
- &nbsp&nbsp 📊 <a href="https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>&nbsp&nbsp |
124
- &nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>&nbsp&nbsp |
125
- </p>
126
-
127
- # 🚀 Human-Like-Qwen2.5-7B-Instruct
128
-
129
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), specifically optimized to generate more human-like and conversational responses.
130
-
131
- The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.
132
-
133
- The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032).
134
-
135
- # 🛠️ Training Configuration
136
-
137
- - **Base Model:** Qwen2.5-7B-Instruct
138
- - **Framework:** Axolotl v0.4.1
139
- - **Hardware:** 2x NVIDIA A100 (80 GB) GPUs
140
- - **Training Time:** ~2 hours 15 minutes
141
- - **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics
142
-
143
- <details><summary>See axolotl config</summary>
144
-
145
- axolotl version: `0.4.1`
146
- ```yaml
147
- base_model: Qwen/Qwen2.5-7B-Instruct
148
- model_type: AutoModalForCausalLM
149
- tokenizer_type: AutoTokenizer
150
-
151
- trust_remote_code: true
152
-
153
- load_in_8bit: true
154
- load_in_4bit: false
155
- strict: false
156
-
157
- chat_template: chatml
158
- rl: dpo
159
- datasets:
160
- - path: HumanLLMs/humanish-dpo-project
161
- type: chatml.prompt_pairs
162
- chat_template: chatml
163
-
164
- dataset_prepared_path:
165
- val_set_size: 0.05
166
- output_dir: ./humanish-qwen2.5-7b-instruct
167
-
168
- sequence_len: 8192
169
- sample_packing: false
170
- pad_to_sequence_len: true
171
-
172
- adapter: lora
173
- lora_model_dir:
174
- lora_r: 8
175
- lora_alpha: 4
176
- lora_dropout: 0.05
177
- lora_target_linear: true
178
- lora_fan_in_fan_out:
179
-
180
- wandb_project: Humanish-DPO
181
- wandb_entity:
182
- wandb_watch:
183
- wandb_name:
184
- wandb_log_model:
185
-
186
- hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct
187
-
188
- gradient_accumulation_steps: 8
189
- micro_batch_size: 2
190
- num_epochs: 1
191
- optimizer: adamw_bnb_8bit
192
- lr_scheduler: cosine
193
- learning_rate: 0.0002
194
-
195
- train_on_inputs: false
196
- group_by_length: false
197
- bf16: auto
198
- fp16:
199
- tf32: false
200
-
201
- gradient_checkpointing: true
202
- early_stopping_patience:
203
- resume_from_checkpoint:
204
- local_rank:
205
- logging_steps: 1
206
- xformers_attention:
207
- flash_attention: true
208
- s2_attention:
209
-
210
- warmup_steps: 10
211
- evals_per_epoch: 2
212
- eval_table_size:
213
- eval_max_new_tokens: 128
214
- saves_per_epoch: 1
215
- debug:
216
- deepspeed:
217
- weight_decay: 0.0
218
- fsdp:
219
- fsdp_config:
220
-
221
- save_safetensors: true
222
- ```
223
-
224
- </details><br>
225
-
226
- # 💬 Prompt Template
227
-
228
- You can use ChatML prompt template while using the model:
229
-
230
- ### ChatML
231
-
232
- ```
233
- <|im_start|>system
234
- {system}<|im_end|>
235
- <|im_start|>user
236
- {user}<|im_end|>
237
- <|im_start|>assistant
238
- {asistant}<|im_end|>
239
- ```
240
-
241
- This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
242
- `tokenizer.apply_chat_template()` method:
243
-
244
- ```python
245
- messages = [
246
- {"role": "system", "content": "You are helpful AI asistant."},
247
- {"role": "user", "content": "Hello!"}
248
- ]
249
- gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
250
- model.generate(**gen_input)
251
- ```
252
-
253
- # 🤖 Models
254
-
255
- | Model | Download |
256
- |:---------------------:|:-----------------------------------------------------------------------:|
257
- | Human-Like-Llama-3-8B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct) |
258
- | Human-Like-Qwen-2.5-7B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) |
259
- | Human-Like-Mistral-Nemo-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) |
260
-
261
- # 🔄 Quantizationed versions
262
-
263
- ## GGUF [@bartowski](https://huggingface.co/bartowski)
264
-
265
- - https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF
266
-
267
- - https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF
268
-
269
- - https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF
270
-
271
-
272
- # 🎯 Benchmark Results
273
-
274
- | **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
275
- |--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
276
- | **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 |
277
- | | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 |
278
- | | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 |
279
- | **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 |
280
- | | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 |
281
- | | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** |
282
- | **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 |
283
- | | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 |
284
- | | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 |
285
-
286
-
287
- # 📊 Dataset
288
-
289
- The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:
290
-
291
- - **Human-like responses:** Natural, conversational answers mimicking human dialogue.
292
- - **Formal responses:** Structured and precise answers with a more formal tone.
293
-
294
- The dataset has been open-sourced and is available at:
295
-
296
- - 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
297
-
298
- More details on the dataset creation process can be found in the accompanying research paper.
299
-
300
- # 📝 Citation
301
-
302
- ```
303
- @misc{çalık2025enhancinghumanlikeresponseslarge,
304
- title={Enhancing Human-Like Responses in Large Language Models},
305
- author={Ethem Yağız Çalık and Talha Rüzgar Akkuş},
306
- year={2025},
307
- eprint={2501.05032},
308
- archivePrefix={arXiv},
309
- primaryClass={cs.CL},
310
- url={https://arxiv.org/abs/2501.05032},
311
- }
 
 
 
 
 
 
312
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - axolotl
5
+ - dpo
6
+ - trl
7
+ base_model: Qwen/Qwen2.5-7B-Instruct
8
+ pipeline_tag: text-generation
9
+ library_name: transformers
10
+ datasets:
11
+ - HumanLLMs/Human-Like-DPO-Dataset
12
+ language:
13
+ - zho
14
+ - eng
15
+ - fra
16
+ - spa
17
+ - por
18
+ - deu
19
+ - ita
20
+ - rus
21
+ - jpn
22
+ - kor
23
+ - vie
24
+ - tha
25
+ - ara
26
+ model-index:
27
+ - name: Humanish-Qwen2.5-7B-Instruct
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: IFEval (0-Shot)
34
+ type: HuggingFaceH4/ifeval
35
+ args:
36
+ num_few_shot: 0
37
+ metrics:
38
+ - type: inst_level_strict_acc and prompt_level_strict_acc
39
+ value: 72.84
40
+ name: strict accuracy
41
+ source:
42
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
43
+ name: Open LLM Leaderboard
44
+ - task:
45
+ type: text-generation
46
+ name: Text Generation
47
+ dataset:
48
+ name: BBH (3-Shot)
49
+ type: BBH
50
+ args:
51
+ num_few_shot: 3
52
+ metrics:
53
+ - type: acc_norm
54
+ value: 34.48
55
+ name: normalized accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: MATH Lvl 5 (4-Shot)
64
+ type: hendrycks/competition_math
65
+ args:
66
+ num_few_shot: 4
67
+ metrics:
68
+ - type: exact_match
69
+ value: 0
70
+ name: exact match
71
+ source:
72
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
73
+ name: Open LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: GPQA (0-shot)
79
+ type: Idavidrein/gpqa
80
+ args:
81
+ num_few_shot: 0
82
+ metrics:
83
+ - type: acc_norm
84
+ value: 6.49
85
+ name: acc_norm
86
+ source:
87
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: MuSR (0-shot)
94
+ type: TAUR-Lab/MuSR
95
+ args:
96
+ num_few_shot: 0
97
+ metrics:
98
+ - type: acc_norm
99
+ value: 8.42
100
+ name: acc_norm
101
+ source:
102
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: MMLU-PRO (5-shot)
109
+ type: TIGER-Lab/MMLU-Pro
110
+ config: main
111
+ split: test
112
+ args:
113
+ num_few_shot: 5
114
+ metrics:
115
+ - type: acc
116
+ value: 37.76
117
+ name: accuracy
118
+ source:
119
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-Qwen2.5-7B-Instruct
120
+ name: Open LLM Leaderboard
121
+ ---
122
+ <div align="center">
123
+ <img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63da3d7ae697e5898cb86854/H-vpXOX6KZu01HnV87Jk5.jpeg" width="320" height="320" />
124
+ <h1>Enhancing Human-Like Responses in Large Language Models</h1>
125
+ </div>
126
+
127
+ <p align="center">
128
+ &nbsp&nbsp | 🤗 <a href="https://huggingface.co/collections/HumanLLMs/human-like-humanish-llms-6759fa68f22e11eb1a10967e">Models</a>&nbsp&nbsp |
129
+ &nbsp&nbsp 📊 <a href="https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset">Dataset</a>&nbsp&nbsp |
130
+ &nbsp&nbsp 📄<a href="https://arxiv.org/abs/2501.05032">Paper</a>&nbsp&nbsp |
131
+ </p>
132
+
133
+ # 🚀 Human-Like-Qwen2.5-7B-Instruct
134
+
135
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), specifically optimized to generate more human-like and conversational responses.
136
+
137
+ The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions.
138
+
139
+ The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032).
140
+
141
+ # 🛠️ Training Configuration
142
+
143
+ - **Base Model:** Qwen2.5-7B-Instruct
144
+ - **Framework:** Axolotl v0.4.1
145
+ - **Hardware:** 2x NVIDIA A100 (80 GB) GPUs
146
+ - **Training Time:** ~2 hours 15 minutes
147
+ - **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics
148
+
149
+ <details><summary>See axolotl config</summary>
150
+
151
+ axolotl version: `0.4.1`
152
+ ```yaml
153
+ base_model: Qwen/Qwen2.5-7B-Instruct
154
+ model_type: AutoModalForCausalLM
155
+ tokenizer_type: AutoTokenizer
156
+
157
+ trust_remote_code: true
158
+
159
+ load_in_8bit: true
160
+ load_in_4bit: false
161
+ strict: false
162
+
163
+ chat_template: chatml
164
+ rl: dpo
165
+ datasets:
166
+ - path: HumanLLMs/humanish-dpo-project
167
+ type: chatml.prompt_pairs
168
+ chat_template: chatml
169
+
170
+ dataset_prepared_path:
171
+ val_set_size: 0.05
172
+ output_dir: ./humanish-qwen2.5-7b-instruct
173
+
174
+ sequence_len: 8192
175
+ sample_packing: false
176
+ pad_to_sequence_len: true
177
+
178
+ adapter: lora
179
+ lora_model_dir:
180
+ lora_r: 8
181
+ lora_alpha: 4
182
+ lora_dropout: 0.05
183
+ lora_target_linear: true
184
+ lora_fan_in_fan_out:
185
+
186
+ wandb_project: Humanish-DPO
187
+ wandb_entity:
188
+ wandb_watch:
189
+ wandb_name:
190
+ wandb_log_model:
191
+
192
+ hub_model_id: HumanLLMs/Humanish-Qwen2.5-7B-Instruct
193
+
194
+ gradient_accumulation_steps: 8
195
+ micro_batch_size: 2
196
+ num_epochs: 1
197
+ optimizer: adamw_bnb_8bit
198
+ lr_scheduler: cosine
199
+ learning_rate: 0.0002
200
+
201
+ train_on_inputs: false
202
+ group_by_length: false
203
+ bf16: auto
204
+ fp16:
205
+ tf32: false
206
+
207
+ gradient_checkpointing: true
208
+ early_stopping_patience:
209
+ resume_from_checkpoint:
210
+ local_rank:
211
+ logging_steps: 1
212
+ xformers_attention:
213
+ flash_attention: true
214
+ s2_attention:
215
+
216
+ warmup_steps: 10
217
+ evals_per_epoch: 2
218
+ eval_table_size:
219
+ eval_max_new_tokens: 128
220
+ saves_per_epoch: 1
221
+ debug:
222
+ deepspeed:
223
+ weight_decay: 0.0
224
+ fsdp:
225
+ fsdp_config:
226
+
227
+ save_safetensors: true
228
+ ```
229
+
230
+ </details><br>
231
+
232
+ # 💬 Prompt Template
233
+
234
+ You can use ChatML prompt template while using the model:
235
+
236
+ ### ChatML
237
+
238
+ ```
239
+ <|im_start|>system
240
+ {system}<|im_end|>
241
+ <|im_start|>user
242
+ {user}<|im_end|>
243
+ <|im_start|>assistant
244
+ {asistant}<|im_end|>
245
+ ```
246
+
247
+ This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the
248
+ `tokenizer.apply_chat_template()` method:
249
+
250
+ ```python
251
+ messages = [
252
+ {"role": "system", "content": "You are helpful AI asistant."},
253
+ {"role": "user", "content": "Hello!"}
254
+ ]
255
+ gen_input = tokenizer.apply_chat_template(message, return_tensors="pt")
256
+ model.generate(**gen_input)
257
+ ```
258
+
259
+ # 🤖 Models
260
+
261
+ | Model | Download |
262
+ |:---------------------:|:-----------------------------------------------------------------------:|
263
+ | Human-Like-Llama-3-8B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct) |
264
+ | Human-Like-Qwen-2.5-7B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) |
265
+ | Human-Like-Mistral-Nemo-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) |
266
+
267
+ # 🔄 Quantizationed versions
268
+
269
+ ## GGUF [@bartowski](https://huggingface.co/bartowski)
270
+
271
+ - https://huggingface.co/bartowski/Human-Like-LLama3-8B-Instruct-GGUF
272
+
273
+ - https://huggingface.co/bartowski/Human-Like-Qwen2.5-7B-Instruct-GGUF
274
+
275
+ - https://huggingface.co/bartowski/Human-Like-Mistral-Nemo-Instruct-2407-GGUF
276
+
277
+
278
+ # 🎯 Benchmark Results
279
+
280
+ | **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** |
281
+ |--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------|
282
+ | **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 |
283
+ | | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 |
284
+ | | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 |
285
+ | **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 |
286
+ | | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 |
287
+ | | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** |
288
+ | **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 |
289
+ | | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 |
290
+ | | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 |
291
+
292
+
293
+ # 📊 Dataset
294
+
295
+ The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of:
296
+
297
+ - **Human-like responses:** Natural, conversational answers mimicking human dialogue.
298
+ - **Formal responses:** Structured and precise answers with a more formal tone.
299
+
300
+ The dataset has been open-sourced and is available at:
301
+
302
+ - 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
303
+
304
+ More details on the dataset creation process can be found in the accompanying research paper.
305
+
306
+ # 📝 Citation
307
+
308
+ ```
309
+ @misc{çalık2025enhancinghumanlikeresponseslarge,
310
+ title={Enhancing Human-Like Responses in Large Language Models},
311
+ author={Ethem Yağız Çalık and Talha Rüzgar Akkuş},
312
+ year={2025},
313
+ eprint={2501.05032},
314
+ archivePrefix={arXiv},
315
+ primaryClass={cs.CL},
316
+ url={https://arxiv.org/abs/2501.05032},
317
+ }
318
  ```