Upload README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Bundled Model Information
|
| 2 |
+
|
| 3 |
+
## T5 Base Prompt Translator
|
| 4 |
+
|
| 5 |
+
This directory contains the pre-trained T5-base model for natural language to WD14 tag translation.
|
| 6 |
+
|
| 7 |
+
### Model Details
|
| 8 |
+
|
| 9 |
+
**Name:** `t5_base_prompt_translator`
|
| 10 |
+
**Base Model:** T5-Base (Google)
|
| 11 |
+
**Parameters:** 220 million
|
| 12 |
+
**Training Data:** 95,000 high-quality anime image prompts from Arcenciel.io
|
| 13 |
+
**Training Duration:** ~10 hours on RTX 4090
|
| 14 |
+
**Model Size:** ~850 MB
|
| 15 |
+
|
| 16 |
+
### Training Configuration
|
| 17 |
+
|
| 18 |
+
- **Epochs:** 7 (~10,388 steps)
|
| 19 |
+
- **Batch Size:** 64 (effective)
|
| 20 |
+
- **Learning Rate:** 3e-4 → 3e-5 (linear decay)
|
| 21 |
+
- **Optimizer:** AdaFactor (memory efficient)
|
| 22 |
+
- **Precision:** BF16 mixed precision
|
| 23 |
+
- **Max Length:** 160 tokens input, 256 tokens output
|
| 24 |
+
|
| 25 |
+
### Performance Metrics
|
| 26 |
+
|
| 27 |
+
**Accuracy:** 85-90% tag matching
|
| 28 |
+
**Final Loss:** ~1.2-1.3
|
| 29 |
+
|
| 30 |
+
**Inference Performance (RTX 4090):**
|
| 31 |
+
- Higher beam counts provide better quality at the cost of speed
|
| 32 |
+
- 2-4 beams: Very fast, good for iteration
|
| 33 |
+
- 8-16 beams: Balanced quality/speed
|
| 34 |
+
- 32-64 beams: Maximum quality, excellent performance on RTX 4090
|
| 35 |
+
|
| 36 |
+
**VRAM Usage:**
|
| 37 |
+
- Model loading: ~2 GB
|
| 38 |
+
- Inference: Additional 1-4 GB depending on beam count
|
| 39 |
+
- Total: ~3-6 GB for highest quality settings
|
| 40 |
+
|
| 41 |
+
**Note:** The model performs exceptionally well even at high beam counts (32-64) on RTX 4090, making it practical to use maximum quality settings for production work.
|
| 42 |
+
|
| 43 |
+
### Data Format
|
| 44 |
+
|
| 45 |
+
**Input Format:**
|
| 46 |
+
```
|
| 47 |
+
translate prompt to tags: [natural language description]
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
**Output Format:**
|
| 51 |
+
```
|
| 52 |
+
tag1, tag2, tag3, tag4, ...
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
**Tag Format:**
|
| 56 |
+
- WD14 tagger format with escaped parentheses: `tag \(descriptor\)`
|
| 57 |
+
- Example: `shrug \(clothing\)`, `blue eyes`, `long hair`
|
| 58 |
+
|
| 59 |
+
### Training Data Sources
|
| 60 |
+
|
| 61 |
+
- **Source:** Arcenciel.io API
|
| 62 |
+
- **Dataset Size:** 95,000 image-prompt pairs
|
| 63 |
+
- **Rating Filter:** None (all ratings included for maximum diversity)
|
| 64 |
+
- **Quality Filter:** None (engagement metrics not widely used on site)
|
| 65 |
+
- **Ground Truth:** WD14 v1.4 MOAT tagger (SmilingWolf)
|
| 66 |
+
|
| 67 |
+
**Note:** Quality filtering was intentionally avoided to prevent limiting the training data diversity. Engagement metrics (hearts, likes) are not consistently used across the site, so filtering by them would have reduced dataset quality rather than improved it.
|
| 68 |
+
|
| 69 |
+
### Model Files
|
| 70 |
+
|
| 71 |
+
- `config.json` - Model configuration
|
| 72 |
+
- `model.safetensors` - Model weights (safetensors format)
|
| 73 |
+
- `tokenizer_config.json` - Tokenizer configuration
|
| 74 |
+
- `spiece.model` - SentencePiece tokenizer model
|
| 75 |
+
- `special_tokens_map.json` - Special tokens mapping
|
| 76 |
+
- `added_tokens.json` - Additional tokens
|
| 77 |
+
- `generation_config.json` - Generation defaults
|
| 78 |
+
- `training_args.bin` - Training arguments (metadata)
|
| 79 |
+
|
| 80 |
+
### License
|
| 81 |
+
|
| 82 |
+
This model is based on T5-Base by Google, which is licensed under Apache 2.0.
|
| 83 |
+
|
| 84 |
+
**Model License:** Apache 2.0
|
| 85 |
+
**Training Data:** Arcenciel.io (public API)
|
| 86 |
+
**Usage:** Free for commercial and non-commercial use
|
| 87 |
+
|
| 88 |
+
### Citation
|
| 89 |
+
|
| 90 |
+
If you use this model in your work, please cite:
|
| 91 |
+
|
| 92 |
+
```
|
| 93 |
+
T5X Prompt Translator Base 95K
|
| 94 |
+
Trained on Arcenciel.io dataset using WD14 v1.4 MOAT tagger
|
| 95 |
+
Base model: T5-Base (Google)
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### Updates & Versions
|
| 99 |
+
|
| 100 |
+
**Version 1.0** (Current)
|
| 101 |
+
- Initial release
|
| 102 |
+
- Trained on 95K prompts
|
| 103 |
+
- T5-Base architecture
|
| 104 |
+
- WD14 v1.4 MOAT ground truth
|
| 105 |
+
|
| 106 |
+
### Support
|
| 107 |
+
|
| 108 |
+
For issues, questions, or feature requests:
|
| 109 |
+
- GitHub Issues: https://github.com/yourusername/tag_generator/issues
|
| 110 |
+
- Documentation: See PARAMETERS.md and README.md
|
| 111 |
+
|
| 112 |
+
---
|
| 113 |
+
|
| 114 |
+
**Note:** This model is bundled with the ComfyUI-T5X-Prompt-Translator custom node for immediate use. You can also place custom models in `ComfyUI/models/llm_checkpoints/` to use them with this node.
|
| 115 |
+
|
| 116 |
+
**Model Directory:** `models/t5_base_prompt_translator/`
|