safecircleai
/

heaven1-base

@@ -1,188 +1,193 @@
----
-base_model: meta-llama/Llama-3.2-1B-Instruct
-library_name: peft
-language: en
-license: mit
-tags:
-- llama
-- llama-3.2
-- safeguarding
-- content-moderation
-- safety
-- predator-detection
-- text-classification
-metrics:
-- accuracy
-- precision
-- recall
-- f1
-pipeline_tag: text-classification
-widget:
-  - text: "Hey, I know we just met but I feel like we have a special connection. Don't tell your parents about our chats, they wouldn't understand. Can you send me a picture of yourself?"
-  - text: "Hey, just checking in to see how your day went. Let me know if you want to grab coffee this weekend."
----
-# Heaven1-base-1b: Guardian - Predatory Behavior Detection Model
-<img src="https://huggingface.co/safecircleai/heaven1-base/resolve/main/Heaven1-guardian.png" alt="Heaven1 Guardian Banner" width="600">
-## Model Description
-Heaven1-base-1b (codename: "Guardian") is a fine-tuned version of Meta's Llama-3.2-1B-Instruct model, specifically optimized to detect and prevent harmful predatory patterns in conversations. This model was created using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA techniques to enable training on consumer-grade hardware.
-## Model Details
-- **Developed by:** SafeCircleIA
-- **Base model:** Meta-Llama-3.2-1B-Instruct
-- **Model type:** Causal Language Model with LoRA adapters
-- **Language:** English
-- **Training method:** QLoRA fine-tuning (4-bit quantization)
-- **License:** MIT (subject to Llama 3.2 usage restrictions)
-## Uses
-### Direct Use
-This model is designed for direct use in:
-- Detecting potentially harmful interactions in text messages
-- Classifying messages as predatory or safe with brief explanations
-- Assisting human moderators in identifying concerning patterns
-- Supporting research on digital safety
-### Out-of-Scope Use
-This model should not be used for:
-- Making autonomous decisions about user safety without human review
-- Creating or refining predatory language patterns
-- As the sole determinant for any safety-critical applications
-- Any application without proper privacy considerations and consent
-## Bias, Risks, and Limitations
-- The model detects patterns based on its training data and may miss novel predatory tactics
-- Performance may vary across different cultural contexts and communication styles
-- False positives and false negatives are possible
-- Relies heavily on conversational patterns identified during training
-- Limited to English language text
-### Recommendations
-- Always combine with human review for best results
-- Consider cultural and contextual factors when interpreting results
-- Regularly evaluate the model's performance in your specific use case
-- Use low temperature settings (0.1-0.3) for more consistent classification results
-## How to Get Started with the Model
-To run inference with this model:
-```bash
-python run_inference.py --use_4bit --model_path ./heaven1-base-1b --base_model meta-llama/Llama-3.2-1B-Instruct
-```
-### Optional Parameters
-- `--max_length` (default: 512): Maximum sequence length
-- `--temperature` (default: 0.1): Controls randomness (lower = more deterministic classification)
-## Training Details
-### Training Data
-The model was fine-tuned on a custom dataset of 10,000 examples, with approximately 50% containing examples of predatory behavior patterns. This balanced dataset ensures the model can effectively identify concerning patterns while maintaining normal conversation capabilities.
-### Training Hyperparameters
-This model was trained with the following hyperparameters:
-- **Learning rate:** 2e-5
-- **Epochs:** 3
-- **Batch size:** 1
-- **Gradient accumulation steps:** 16
-- **LoRA rank (r):** 8
-- **LoRA alpha:** 16
-- **LoRA dropout:** 0.05
-- **4-bit quantization:** Yes (NF4 format)
-- **Max sequence length:** 2048
-## Evaluation
-### Testing Data & Metrics
-The model was evaluated on a held-out test set (10% of the dataset) with the following metrics:
-- **Accuracy:** Measures overall classification correctness
-- **Precision:** Measures how many identified predatory messages were actually predatory
-- **Recall:** Measures how many actual predatory messages were identified
-- **F1 Score:** Harmonic mean of precision and recall
-### Results
-Evaluation metrics on test dataset:
-| Metric | Score |
-|--------|-------|
-| Accuracy | 93.8% |
-| Precision | 92.4% |
-| Recall | 95.1% |
-| F1 | 93.7% |
-## Environmental Impact
-- **Hardware Type:** Consumer GPU (NVIDIA RTX 2060, 6GB VRAM)
-- **Hours used:** Approximately 3 hours for training
-- **Energy consumption:** Minimal due to efficient QLoRA fine-tuning
-## Performance and Limitations
-- **Hardware requirements:** Can run on consumer GPUs with at least 6GB VRAM when used with 4-bit quantization
-- **Sequence length:** Optimized for sequences up to 2048 tokens
-- **Limitations:**
-  - As with any AI model, it may occasionally miss subtle predatory patterns
-  - False positives are possible in ambiguous situations
-  - Performance depends on input context quality
-## Ethical Considerations
-This model is designed to help identify and prevent potentially harmful predatory patterns in conversations. However, it should not be used as the sole determinant for making important decisions. Human oversight is essential when deploying this model in real-world applications.
-- Respect privacy and obtain appropriate consent when analyzing communications
-- Be transparent about the use of AI detection systems
-- Consider the impact of false positives on legitimate communications
-## Contact
-For questions or concerns about this model, please contact SafeCircleIA or open an issue in the project repository.
-## Citation
-```
-@misc{heaven1-base-2025,
-  author = {SafeCircleIA},
-  title = {Heaven1-base-1b: Guardian - Predatory Behavior Detection Model},
-  year = {2025},
-  publisher = {Hugging Face},
-  howpublished = {\url{https://huggingface.co/safecircleai/heaven1-base}}
-}
-```
-## Training procedure
-The following `bitsandbytes` quantization config was used during training:
-- quant_method: QuantizationMethod.BITS_AND_BYTES
-- _load_in_8bit: False
-- _load_in_4bit: True
-- llm_int8_threshold: 6.0
-- llm_int8_skip_modules: None
-- llm_int8_enable_fp32_cpu_offload: False
-- llm_int8_has_fp16_weight: False
-- bnb_4bit_quant_type: nf4
-- bnb_4bit_use_double_quant: True
-- bnb_4bit_compute_dtype: float16
-- bnb_4bit_quant_storage: uint8
-- load_in_4bit: True
-- load_in_8bit: False
-### Framework versions
-- PEFT 0.6.0

+---
+base_model: meta-llama/Llama-3.2-1B-Instruct
+library_name: peft
+language: en
+license: llama3.2
+tags:
+- llama
+- llama-3.2
+- safeguarding
+- content-moderation
+- safety
+- predator-detection
+- text-classification
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+pipeline_tag: text-classification
+widget:
+- text: >-
+    Hey, I know we just met but I feel like we have a special connection. Don't
+    tell your parents about our chats, they wouldn't understand. Can you send me
+    a picture of yourself?
+- text: >-
+    Hey, just checking in to see how your day went. Let me know if you want to
+    grab coffee this weekend.
+---
+# Heaven1-base-1b: Guardian - Predatory Behavior Detection Model
+<img src="https://huggingface.co/safecircleai/heaven1-base/resolve/main/Heaven1-guardian.png" alt="Heaven1 Guardian Banner" width="600">
+## Model Description
+Heaven1-base-1b (codename: "Guardian") is a fine-tuned version of Meta's Llama-3.2-1B-Instruct model, specifically optimized to detect and prevent harmful predatory patterns in conversations. This model was created using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA techniques to enable training on consumer-grade hardware.
+## Model Details
+- **Developed by:** SafeCircleIA
+- **Base model:** Meta-Llama-3.2-1B-Instruct
+- **Model type:** Causal Language Model with LoRA adapters
+- **Language:** English
+- **Training method:** QLoRA fine-tuning (4-bit quantization)
+- **License:** MIT (subject to Llama 3.2 usage restrictions)
+## Uses
+### Direct Use
+This model is designed for direct use in:
+- Detecting potentially harmful interactions in text messages
+- Classifying messages as predatory or safe with brief explanations
+- Assisting human moderators in identifying concerning patterns
+- Supporting research on digital safety
+### Out-of-Scope Use
+This model should not be used for:
+- Making autonomous decisions about user safety without human review
+- Creating or refining predatory language patterns
+- As the sole determinant for any safety-critical applications
+- Any application without proper privacy considerations and consent
+## Bias, Risks, and Limitations
+- The model detects patterns based on its training data and may miss novel predatory tactics
+- Performance may vary across different cultural contexts and communication styles
+- False positives and false negatives are possible
+- Relies heavily on conversational patterns identified during training
+- Limited to English language text
+### Recommendations
+- Always combine with human review for best results
+- Consider cultural and contextual factors when interpreting results
+- Regularly evaluate the model's performance in your specific use case
+- Use low temperature settings (0.1-0.3) for more consistent classification results
+## How to Get Started with the Model
+To run inference with this model:
+```bash
+python run_inference.py --use_4bit --model_path ./heaven1-base-1b --base_model meta-llama/Llama-3.2-1B-Instruct
+```
+### Optional Parameters
+- `--max_length` (default: 512): Maximum sequence length
+- `--temperature` (default: 0.1): Controls randomness (lower = more deterministic classification)
+## Training Details
+### Training Data
+The model was fine-tuned on a custom dataset of 10,000 examples, with approximately 50% containing examples of predatory behavior patterns. This balanced dataset ensures the model can effectively identify concerning patterns while maintaining normal conversation capabilities.
+### Training Hyperparameters
+This model was trained with the following hyperparameters:
+- **Learning rate:** 2e-5
+- **Epochs:** 3
+- **Batch size:** 1
+- **Gradient accumulation steps:** 16
+- **LoRA rank (r):** 8
+- **LoRA alpha:** 16
+- **LoRA dropout:** 0.05
+- **4-bit quantization:** Yes (NF4 format)
+- **Max sequence length:** 2048
+## Evaluation
+### Testing Data & Metrics
+The model was evaluated on a held-out test set (10% of the dataset) with the following metrics:
+- **Accuracy:** Measures overall classification correctness
+- **Precision:** Measures how many identified predatory messages were actually predatory
+- **Recall:** Measures how many actual predatory messages were identified
+- **F1 Score:** Harmonic mean of precision and recall
+### Results
+Evaluation metrics on test dataset:
+| Metric | Score |
+|--------|-------|
+| Accuracy | 93.8% |
+| Precision | 92.4% |
+| Recall | 95.1% |
+| F1 | 93.7% |
+## Environmental Impact
+- **Hardware Type:** Consumer GPU (NVIDIA RTX 2060, 6GB VRAM)
+- **Hours used:** Approximately 3 hours for training
+- **Energy consumption:** Minimal due to efficient QLoRA fine-tuning
+## Performance and Limitations
+- **Hardware requirements:** Can run on consumer GPUs with at least 6GB VRAM when used with 4-bit quantization
+- **Sequence length:** Optimized for sequences up to 2048 tokens
+- **Limitations:**
+  - As with any AI model, it may occasionally miss subtle predatory patterns
+  - False positives are possible in ambiguous situations
+  - Performance depends on input context quality
+## Ethical Considerations
+This model is designed to help identify and prevent potentially harmful predatory patterns in conversations. However, it should not be used as the sole determinant for making important decisions. Human oversight is essential when deploying this model in real-world applications.
+- Respect privacy and obtain appropriate consent when analyzing communications
+- Be transparent about the use of AI detection systems
+- Consider the impact of false positives on legitimate communications
+## Contact
+For questions or concerns about this model, please contact SafeCircleIA or open an issue in the project repository.
+## Citation
+```
+@misc{heaven1-base-2025,
+  author = {SafeCircleIA},
+  title = {Heaven1-base-1b: Guardian - Predatory Behavior Detection Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/safecircleai/heaven1-base}}
+}
+```
+## Training procedure
+The following `bitsandbytes` quantization config was used during training:
+- quant_method: QuantizationMethod.BITS_AND_BYTES
+- _load_in_8bit: False
+- _load_in_4bit: True
+- llm_int8_threshold: 6.0
+- llm_int8_skip_modules: None
+- llm_int8_enable_fp32_cpu_offload: False
+- llm_int8_has_fp16_weight: False
+- bnb_4bit_quant_type: nf4
+- bnb_4bit_use_double_quant: True
+- bnb_4bit_compute_dtype: float16
+- bnb_4bit_quant_storage: uint8
+- load_in_4bit: True
+- load_in_8bit: False
+### Framework versions
+- PEFT 0.6.0