Update README.md
Browse files
README.md
CHANGED
@@ -11,26 +11,57 @@ language:
|
|
11 |
- en
|
12 |
---
|
13 |
|
14 |
-
# Uploaded
|
15 |
|
16 |
-
- **Developed by:** forestav
|
17 |
-
- **License:** apache-2.0
|
18 |
-
- **Finetuned from model
|
19 |
|
20 |
-
|
|
|
21 |
|
22 |
-
|
|
|
|
|
23 |
|
24 |
-
|
25 |
-
When having a large learning rate, the large weight updates can overwrite pretrained weights, causing the model to lose generalization.
|
26 |
|
27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
-
|
30 |
-
**
|
|
|
31 |
|
32 |
-
**
|
|
|
33 |
|
34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
11 |
- en
|
12 |
---
|
13 |
|
14 |
+
# Uploaded model
|
15 |
|
16 |
+
- **Developed by:** forestav
|
17 |
+
- **License:** apache-2.0
|
18 |
+
- **Finetuned from model:** [unsloth/llama-3.2-1b-instruct-bnb-4bit](https://huggingface.co/unsloth/llama-3.2-1b-instruct-bnb-4bit)
|
19 |
|
20 |
+
## Model description
|
21 |
+
This model is a refined version of a LoRA adapter trained on the **unsloth/Llama-3.2-3B-Instruct** model using the **FineTome-100k** dataset. The finetuned model uses fewer parameters (1B vs. 3B) to achieve faster training and improved adaptability for specific tasks, such as medical applications.
|
22 |
|
23 |
+
### Key adjustments:
|
24 |
+
1. **Reduced Parameter Count:** The model was downsized to 1B parameters to improve training efficiency and ease customization.
|
25 |
+
2. **Adjusted Learning Rate:** A smaller learning rate was used to prevent overfitting and mitigate catastrophic forgetting. This ensures the model retains its general pretraining knowledge while learning new tasks effectively.
|
26 |
|
27 |
+
The finetuning dataset, **ruslanmv/ai-medical-chatbot**, contains only 257k rows, which necessitated careful hyperparameter tuning to avoid over-specialization.
|
|
|
28 |
|
29 |
+
---
|
30 |
+
|
31 |
+
## Hyperparameters and explanations
|
32 |
+
|
33 |
+
- **Learning rate:** `2e-5`
|
34 |
+
A smaller learning rate reduces the risk of overfitting and catastrophic forgetting, particularly when working with models containing fewer parameters.
|
35 |
+
|
36 |
+
- **Warm-up steps:** `5`
|
37 |
+
Warm-up allows the optimizer to gather gradient statistics before training at the full learning rate, improving stability.
|
38 |
+
|
39 |
+
- **Per device train batch size:** `2`
|
40 |
+
Each GPU processes 2 training samples per step. This setup is suitable for resource-constrained environments.
|
41 |
+
|
42 |
+
- **Gradient accumulation steps:** `4`
|
43 |
+
Gradients are accumulated over 4 steps to simulate a larger batch size (effective batch size: 8) without exceeding memory limits.
|
44 |
|
45 |
+
- **Optimizer:** `AdamW with 8-bit Quantization`
|
46 |
+
- **AdamW:** Adds weight decay to prevent overfitting.
|
47 |
+
- **8-bit Quantization:** Reduces memory usage by compressing optimizer states, facilitating faster training.
|
48 |
|
49 |
+
- **Weight decay:** `0.01`
|
50 |
+
Standard weight decay value effective across various training scenarios.
|
51 |
|
52 |
+
- **Learning rate scheduler type:** `Linear`
|
53 |
+
Gradually decreases the learning rate from the initial value to zero over the course of training.
|
54 |
+
|
55 |
+
---
|
56 |
+
|
57 |
+
## Quantization details
|
58 |
+
The model is saved in **16-bit GGUF format**, which:
|
59 |
+
- Ensures **100% accuracy retention**.
|
60 |
+
- Trades off speed and memory for improved precision.
|
61 |
+
|
62 |
+
### Training optimization
|
63 |
+
Training was accelerated by **2x** using [Unsloth](https://github.com/unslothai/unsloth) in combination with Hugging Face's **TRL library**.
|
64 |
+
|
65 |
+
---
|
66 |
|
67 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|