Elldreth commited on
Commit
8fd2937
·
verified ·
1 Parent(s): e0eba5a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bundled Model Information
2
+
3
+ ## T5 Base Prompt Translator
4
+
5
+ This directory contains the pre-trained T5-base model for natural language to WD14 tag translation.
6
+
7
+ ### Model Details
8
+
9
+ **Name:** `t5_base_prompt_translator`
10
+ **Base Model:** T5-Base (Google)
11
+ **Parameters:** 220 million
12
+ **Training Data:** 95,000 high-quality anime image prompts from Arcenciel.io
13
+ **Training Duration:** ~10 hours on RTX 4090
14
+ **Model Size:** ~850 MB
15
+
16
+ ### Training Configuration
17
+
18
+ - **Epochs:** 7 (~10,388 steps)
19
+ - **Batch Size:** 64 (effective)
20
+ - **Learning Rate:** 3e-4 → 3e-5 (linear decay)
21
+ - **Optimizer:** AdaFactor (memory efficient)
22
+ - **Precision:** BF16 mixed precision
23
+ - **Max Length:** 160 tokens input, 256 tokens output
24
+
25
+ ### Performance Metrics
26
+
27
+ **Accuracy:** 85-90% tag matching
28
+ **Final Loss:** ~1.2-1.3
29
+
30
+ **Inference Performance (RTX 4090):**
31
+ - Higher beam counts provide better quality at the cost of speed
32
+ - 2-4 beams: Very fast, good for iteration
33
+ - 8-16 beams: Balanced quality/speed
34
+ - 32-64 beams: Maximum quality, excellent performance on RTX 4090
35
+
36
+ **VRAM Usage:**
37
+ - Model loading: ~2 GB
38
+ - Inference: Additional 1-4 GB depending on beam count
39
+ - Total: ~3-6 GB for highest quality settings
40
+
41
+ **Note:** The model performs exceptionally well even at high beam counts (32-64) on RTX 4090, making it practical to use maximum quality settings for production work.
42
+
43
+ ### Data Format
44
+
45
+ **Input Format:**
46
+ ```
47
+ translate prompt to tags: [natural language description]
48
+ ```
49
+
50
+ **Output Format:**
51
+ ```
52
+ tag1, tag2, tag3, tag4, ...
53
+ ```
54
+
55
+ **Tag Format:**
56
+ - WD14 tagger format with escaped parentheses: `tag \(descriptor\)`
57
+ - Example: `shrug \(clothing\)`, `blue eyes`, `long hair`
58
+
59
+ ### Training Data Sources
60
+
61
+ - **Source:** Arcenciel.io API
62
+ - **Dataset Size:** 95,000 image-prompt pairs
63
+ - **Rating Filter:** None (all ratings included for maximum diversity)
64
+ - **Quality Filter:** None (engagement metrics not widely used on site)
65
+ - **Ground Truth:** WD14 v1.4 MOAT tagger (SmilingWolf)
66
+
67
+ **Note:** Quality filtering was intentionally avoided to prevent limiting the training data diversity. Engagement metrics (hearts, likes) are not consistently used across the site, so filtering by them would have reduced dataset quality rather than improved it.
68
+
69
+ ### Model Files
70
+
71
+ - `config.json` - Model configuration
72
+ - `model.safetensors` - Model weights (safetensors format)
73
+ - `tokenizer_config.json` - Tokenizer configuration
74
+ - `spiece.model` - SentencePiece tokenizer model
75
+ - `special_tokens_map.json` - Special tokens mapping
76
+ - `added_tokens.json` - Additional tokens
77
+ - `generation_config.json` - Generation defaults
78
+ - `training_args.bin` - Training arguments (metadata)
79
+
80
+ ### License
81
+
82
+ This model is based on T5-Base by Google, which is licensed under Apache 2.0.
83
+
84
+ **Model License:** Apache 2.0
85
+ **Training Data:** Arcenciel.io (public API)
86
+ **Usage:** Free for commercial and non-commercial use
87
+
88
+ ### Citation
89
+
90
+ If you use this model in your work, please cite:
91
+
92
+ ```
93
+ T5X Prompt Translator Base 95K
94
+ Trained on Arcenciel.io dataset using WD14 v1.4 MOAT tagger
95
+ Base model: T5-Base (Google)
96
+ ```
97
+
98
+ ### Updates & Versions
99
+
100
+ **Version 1.0** (Current)
101
+ - Initial release
102
+ - Trained on 95K prompts
103
+ - T5-Base architecture
104
+ - WD14 v1.4 MOAT ground truth
105
+
106
+ ### Support
107
+
108
+ For issues, questions, or feature requests:
109
+ - GitHub Issues: https://github.com/yourusername/tag_generator/issues
110
+ - Documentation: See PARAMETERS.md and README.md
111
+
112
+ ---
113
+
114
+ **Note:** This model is bundled with the ComfyUI-T5X-Prompt-Translator custom node for immediate use. You can also place custom models in `ComfyUI/models/llm_checkpoints/` to use them with this node.
115
+
116
+ **Model Directory:** `models/t5_base_prompt_translator/`