tomasps commited on
Commit
f02554d
·
verified ·
1 Parent(s): 7689129

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +193 -188
README.md CHANGED
@@ -1,188 +1,193 @@
1
- ---
2
- base_model: meta-llama/Llama-3.2-1B-Instruct
3
- library_name: peft
4
- language: en
5
- license: mit
6
- tags:
7
- - llama
8
- - llama-3.2
9
- - safeguarding
10
- - content-moderation
11
- - safety
12
- - predator-detection
13
- - text-classification
14
- metrics:
15
- - accuracy
16
- - precision
17
- - recall
18
- - f1
19
- pipeline_tag: text-classification
20
- widget:
21
- - text: "Hey, I know we just met but I feel like we have a special connection. Don't tell your parents about our chats, they wouldn't understand. Can you send me a picture of yourself?"
22
- - text: "Hey, just checking in to see how your day went. Let me know if you want to grab coffee this weekend."
23
- ---
24
-
25
- # Heaven1-base-1b: Guardian - Predatory Behavior Detection Model
26
-
27
- <img src="https://huggingface.co/safecircleai/heaven1-base/resolve/main/Heaven1-guardian.png" alt="Heaven1 Guardian Banner" width="600">
28
-
29
- ## Model Description
30
-
31
- Heaven1-base-1b (codename: "Guardian") is a fine-tuned version of Meta's Llama-3.2-1B-Instruct model, specifically optimized to detect and prevent harmful predatory patterns in conversations. This model was created using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA techniques to enable training on consumer-grade hardware.
32
-
33
- ## Model Details
34
-
35
- - **Developed by:** SafeCircleIA
36
- - **Base model:** Meta-Llama-3.2-1B-Instruct
37
- - **Model type:** Causal Language Model with LoRA adapters
38
- - **Language:** English
39
- - **Training method:** QLoRA fine-tuning (4-bit quantization)
40
- - **License:** MIT (subject to Llama 3.2 usage restrictions)
41
-
42
- ## Uses
43
-
44
- ### Direct Use
45
-
46
- This model is designed for direct use in:
47
- - Detecting potentially harmful interactions in text messages
48
- - Classifying messages as predatory or safe with brief explanations
49
- - Assisting human moderators in identifying concerning patterns
50
- - Supporting research on digital safety
51
-
52
- ### Out-of-Scope Use
53
-
54
- This model should not be used for:
55
- - Making autonomous decisions about user safety without human review
56
- - Creating or refining predatory language patterns
57
- - As the sole determinant for any safety-critical applications
58
- - Any application without proper privacy considerations and consent
59
-
60
- ## Bias, Risks, and Limitations
61
-
62
- - The model detects patterns based on its training data and may miss novel predatory tactics
63
- - Performance may vary across different cultural contexts and communication styles
64
- - False positives and false negatives are possible
65
- - Relies heavily on conversational patterns identified during training
66
- - Limited to English language text
67
-
68
- ### Recommendations
69
-
70
- - Always combine with human review for best results
71
- - Consider cultural and contextual factors when interpreting results
72
- - Regularly evaluate the model's performance in your specific use case
73
- - Use low temperature settings (0.1-0.3) for more consistent classification results
74
-
75
- ## How to Get Started with the Model
76
-
77
- To run inference with this model:
78
-
79
- ```bash
80
- python run_inference.py --use_4bit --model_path ./heaven1-base-1b --base_model meta-llama/Llama-3.2-1B-Instruct
81
- ```
82
-
83
- ### Optional Parameters
84
-
85
- - `--max_length` (default: 512): Maximum sequence length
86
- - `--temperature` (default: 0.1): Controls randomness (lower = more deterministic classification)
87
-
88
- ## Training Details
89
-
90
- ### Training Data
91
-
92
- The model was fine-tuned on a custom dataset of 10,000 examples, with approximately 50% containing examples of predatory behavior patterns. This balanced dataset ensures the model can effectively identify concerning patterns while maintaining normal conversation capabilities.
93
-
94
- ### Training Hyperparameters
95
-
96
- This model was trained with the following hyperparameters:
97
-
98
- - **Learning rate:** 2e-5
99
- - **Epochs:** 3
100
- - **Batch size:** 1
101
- - **Gradient accumulation steps:** 16
102
- - **LoRA rank (r):** 8
103
- - **LoRA alpha:** 16
104
- - **LoRA dropout:** 0.05
105
- - **4-bit quantization:** Yes (NF4 format)
106
- - **Max sequence length:** 2048
107
-
108
- ## Evaluation
109
-
110
- ### Testing Data & Metrics
111
-
112
- The model was evaluated on a held-out test set (10% of the dataset) with the following metrics:
113
-
114
- - **Accuracy:** Measures overall classification correctness
115
- - **Precision:** Measures how many identified predatory messages were actually predatory
116
- - **Recall:** Measures how many actual predatory messages were identified
117
- - **F1 Score:** Harmonic mean of precision and recall
118
-
119
- ### Results
120
-
121
- Evaluation metrics on test dataset:
122
-
123
- | Metric | Score |
124
- |--------|-------|
125
- | Accuracy | 93.8% |
126
- | Precision | 92.4% |
127
- | Recall | 95.1% |
128
- | F1 | 93.7% |
129
-
130
- ## Environmental Impact
131
-
132
- - **Hardware Type:** Consumer GPU (NVIDIA RTX 2060, 6GB VRAM)
133
- - **Hours used:** Approximately 3 hours for training
134
- - **Energy consumption:** Minimal due to efficient QLoRA fine-tuning
135
-
136
- ## Performance and Limitations
137
-
138
- - **Hardware requirements:** Can run on consumer GPUs with at least 6GB VRAM when used with 4-bit quantization
139
- - **Sequence length:** Optimized for sequences up to 2048 tokens
140
- - **Limitations:**
141
- - As with any AI model, it may occasionally miss subtle predatory patterns
142
- - False positives are possible in ambiguous situations
143
- - Performance depends on input context quality
144
-
145
- ## Ethical Considerations
146
-
147
- This model is designed to help identify and prevent potentially harmful predatory patterns in conversations. However, it should not be used as the sole determinant for making important decisions. Human oversight is essential when deploying this model in real-world applications.
148
-
149
- - Respect privacy and obtain appropriate consent when analyzing communications
150
- - Be transparent about the use of AI detection systems
151
- - Consider the impact of false positives on legitimate communications
152
-
153
- ## Contact
154
-
155
- For questions or concerns about this model, please contact SafeCircleIA or open an issue in the project repository.
156
-
157
- ## Citation
158
-
159
- ```
160
- @misc{heaven1-base-2025,
161
- author = {SafeCircleIA},
162
- title = {Heaven1-base-1b: Guardian - Predatory Behavior Detection Model},
163
- year = {2025},
164
- publisher = {Hugging Face},
165
- howpublished = {\url{https://huggingface.co/safecircleai/heaven1-base}}
166
- }
167
- ```
168
-
169
- ## Training procedure
170
-
171
- The following `bitsandbytes` quantization config was used during training:
172
- - quant_method: QuantizationMethod.BITS_AND_BYTES
173
- - _load_in_8bit: False
174
- - _load_in_4bit: True
175
- - llm_int8_threshold: 6.0
176
- - llm_int8_skip_modules: None
177
- - llm_int8_enable_fp32_cpu_offload: False
178
- - llm_int8_has_fp16_weight: False
179
- - bnb_4bit_quant_type: nf4
180
- - bnb_4bit_use_double_quant: True
181
- - bnb_4bit_compute_dtype: float16
182
- - bnb_4bit_quant_storage: uint8
183
- - load_in_4bit: True
184
- - load_in_8bit: False
185
-
186
- ### Framework versions
187
-
188
- - PEFT 0.6.0
 
 
 
 
 
 
1
+ ---
2
+ base_model: meta-llama/Llama-3.2-1B-Instruct
3
+ library_name: peft
4
+ language: en
5
+ license: llama3.2
6
+ tags:
7
+ - llama
8
+ - llama-3.2
9
+ - safeguarding
10
+ - content-moderation
11
+ - safety
12
+ - predator-detection
13
+ - text-classification
14
+ metrics:
15
+ - accuracy
16
+ - precision
17
+ - recall
18
+ - f1
19
+ pipeline_tag: text-classification
20
+ widget:
21
+ - text: >-
22
+ Hey, I know we just met but I feel like we have a special connection. Don't
23
+ tell your parents about our chats, they wouldn't understand. Can you send me
24
+ a picture of yourself?
25
+ - text: >-
26
+ Hey, just checking in to see how your day went. Let me know if you want to
27
+ grab coffee this weekend.
28
+ ---
29
+
30
+ # Heaven1-base-1b: Guardian - Predatory Behavior Detection Model
31
+
32
+ <img src="https://huggingface.co/safecircleai/heaven1-base/resolve/main/Heaven1-guardian.png" alt="Heaven1 Guardian Banner" width="600">
33
+
34
+ ## Model Description
35
+
36
+ Heaven1-base-1b (codename: "Guardian") is a fine-tuned version of Meta's Llama-3.2-1B-Instruct model, specifically optimized to detect and prevent harmful predatory patterns in conversations. This model was created using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA techniques to enable training on consumer-grade hardware.
37
+
38
+ ## Model Details
39
+
40
+ - **Developed by:** SafeCircleIA
41
+ - **Base model:** Meta-Llama-3.2-1B-Instruct
42
+ - **Model type:** Causal Language Model with LoRA adapters
43
+ - **Language:** English
44
+ - **Training method:** QLoRA fine-tuning (4-bit quantization)
45
+ - **License:** MIT (subject to Llama 3.2 usage restrictions)
46
+
47
+ ## Uses
48
+
49
+ ### Direct Use
50
+
51
+ This model is designed for direct use in:
52
+ - Detecting potentially harmful interactions in text messages
53
+ - Classifying messages as predatory or safe with brief explanations
54
+ - Assisting human moderators in identifying concerning patterns
55
+ - Supporting research on digital safety
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ This model should not be used for:
60
+ - Making autonomous decisions about user safety without human review
61
+ - Creating or refining predatory language patterns
62
+ - As the sole determinant for any safety-critical applications
63
+ - Any application without proper privacy considerations and consent
64
+
65
+ ## Bias, Risks, and Limitations
66
+
67
+ - The model detects patterns based on its training data and may miss novel predatory tactics
68
+ - Performance may vary across different cultural contexts and communication styles
69
+ - False positives and false negatives are possible
70
+ - Relies heavily on conversational patterns identified during training
71
+ - Limited to English language text
72
+
73
+ ### Recommendations
74
+
75
+ - Always combine with human review for best results
76
+ - Consider cultural and contextual factors when interpreting results
77
+ - Regularly evaluate the model's performance in your specific use case
78
+ - Use low temperature settings (0.1-0.3) for more consistent classification results
79
+
80
+ ## How to Get Started with the Model
81
+
82
+ To run inference with this model:
83
+
84
+ ```bash
85
+ python run_inference.py --use_4bit --model_path ./heaven1-base-1b --base_model meta-llama/Llama-3.2-1B-Instruct
86
+ ```
87
+
88
+ ### Optional Parameters
89
+
90
+ - `--max_length` (default: 512): Maximum sequence length
91
+ - `--temperature` (default: 0.1): Controls randomness (lower = more deterministic classification)
92
+
93
+ ## Training Details
94
+
95
+ ### Training Data
96
+
97
+ The model was fine-tuned on a custom dataset of 10,000 examples, with approximately 50% containing examples of predatory behavior patterns. This balanced dataset ensures the model can effectively identify concerning patterns while maintaining normal conversation capabilities.
98
+
99
+ ### Training Hyperparameters
100
+
101
+ This model was trained with the following hyperparameters:
102
+
103
+ - **Learning rate:** 2e-5
104
+ - **Epochs:** 3
105
+ - **Batch size:** 1
106
+ - **Gradient accumulation steps:** 16
107
+ - **LoRA rank (r):** 8
108
+ - **LoRA alpha:** 16
109
+ - **LoRA dropout:** 0.05
110
+ - **4-bit quantization:** Yes (NF4 format)
111
+ - **Max sequence length:** 2048
112
+
113
+ ## Evaluation
114
+
115
+ ### Testing Data & Metrics
116
+
117
+ The model was evaluated on a held-out test set (10% of the dataset) with the following metrics:
118
+
119
+ - **Accuracy:** Measures overall classification correctness
120
+ - **Precision:** Measures how many identified predatory messages were actually predatory
121
+ - **Recall:** Measures how many actual predatory messages were identified
122
+ - **F1 Score:** Harmonic mean of precision and recall
123
+
124
+ ### Results
125
+
126
+ Evaluation metrics on test dataset:
127
+
128
+ | Metric | Score |
129
+ |--------|-------|
130
+ | Accuracy | 93.8% |
131
+ | Precision | 92.4% |
132
+ | Recall | 95.1% |
133
+ | F1 | 93.7% |
134
+
135
+ ## Environmental Impact
136
+
137
+ - **Hardware Type:** Consumer GPU (NVIDIA RTX 2060, 6GB VRAM)
138
+ - **Hours used:** Approximately 3 hours for training
139
+ - **Energy consumption:** Minimal due to efficient QLoRA fine-tuning
140
+
141
+ ## Performance and Limitations
142
+
143
+ - **Hardware requirements:** Can run on consumer GPUs with at least 6GB VRAM when used with 4-bit quantization
144
+ - **Sequence length:** Optimized for sequences up to 2048 tokens
145
+ - **Limitations:**
146
+ - As with any AI model, it may occasionally miss subtle predatory patterns
147
+ - False positives are possible in ambiguous situations
148
+ - Performance depends on input context quality
149
+
150
+ ## Ethical Considerations
151
+
152
+ This model is designed to help identify and prevent potentially harmful predatory patterns in conversations. However, it should not be used as the sole determinant for making important decisions. Human oversight is essential when deploying this model in real-world applications.
153
+
154
+ - Respect privacy and obtain appropriate consent when analyzing communications
155
+ - Be transparent about the use of AI detection systems
156
+ - Consider the impact of false positives on legitimate communications
157
+
158
+ ## Contact
159
+
160
+ For questions or concerns about this model, please contact SafeCircleIA or open an issue in the project repository.
161
+
162
+ ## Citation
163
+
164
+ ```
165
+ @misc{heaven1-base-2025,
166
+ author = {SafeCircleIA},
167
+ title = {Heaven1-base-1b: Guardian - Predatory Behavior Detection Model},
168
+ year = {2025},
169
+ publisher = {Hugging Face},
170
+ howpublished = {\url{https://huggingface.co/safecircleai/heaven1-base}}
171
+ }
172
+ ```
173
+
174
+ ## Training procedure
175
+
176
+ The following `bitsandbytes` quantization config was used during training:
177
+ - quant_method: QuantizationMethod.BITS_AND_BYTES
178
+ - _load_in_8bit: False
179
+ - _load_in_4bit: True
180
+ - llm_int8_threshold: 6.0
181
+ - llm_int8_skip_modules: None
182
+ - llm_int8_enable_fp32_cpu_offload: False
183
+ - llm_int8_has_fp16_weight: False
184
+ - bnb_4bit_quant_type: nf4
185
+ - bnb_4bit_use_double_quant: True
186
+ - bnb_4bit_compute_dtype: float16
187
+ - bnb_4bit_quant_storage: uint8
188
+ - load_in_4bit: True
189
+ - load_in_8bit: False
190
+
191
+ ### Framework versions
192
+
193
+ - PEFT 0.6.0