Not-For-All-Audiences

xet

Community

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

这是一个基于 Qwen/Qwen3-0.6B-Base 进行指令微调的语言模型，专注于处理和生成与动漫图像标签体系相关的自然语言和标签数据。

模型详情

基础模型: Qwen/Qwen3-0.6B-Base
微调方法: 指令微调 (Instruction SFT)
微调框架: LLaMA-Factory
训练数据:
- 数据来源: 1 ~ 9,803,999 的所有标签和vlm生成的caption。
- 样本数量:五条指令组成共42,268,080 条
- 数据集总Token: 约 121 亿
- 平均长度: 287 Token
训练进度: 已完成近一个 epoch，累计训练 98.7 亿 Token。
硬件配置: 3 x NVIDIA GeForce RTX 4090
上下文长度: 768。此长度覆盖了 99.5% 的训练样本。为保证输入 XML 结构的完整性，超长的样本在训练中被舍弃。

评估损失 (Eval Loss)

以下是模型 v0.1 与 v0.5 在各项任务上的评估损失对比。

任务 (Task)	v0.1	v0.5
`eval_nltotag_loss`	0.8972	0.8684
`eval_shorttolong_loss`	1.2120	1.1803
`eval_tagdetail_loss`	0.9317	0.9059
`eval_tagtonl_loss`	1.2363	1.2057
`eval_tagtotag_loss`	0.7396	0.7206

与 Neta-Lumina 的协同设计

本模型是一个专为 Neta-Lumina模型 设计的文本处理引擎。

由于此语言模型与 Neta-Lumina 图像模型使用了高度类似的高质量自然语言-标签数据集进行训练，二者在数据理解上具有天然的一致性。这意味着：

高度适配的理解能力： 本模型生成的标签 (Tags) 和自然语言描述 (Captions) 在风格、结构和细节上，与 Neta-Lumina 的“偏好”高度契合。
释放 T2I 模型潜力： 使用本模型生成的精准提示词，可以更有效地引导 Neta-Lumina 创作出符合预期的、高质量的图像作品。

用于其他模型 (如 noobai-XL)

对于依赖标签的模型，本模型可以高效生成、补全和优化标签集。

使用方式：
1. 调用 <NLTOTAG>, <TAGTOTAG> 或 <TAGDETAIL> 指令。
2. 编写一个简单的脚本，提取输出结果 XML 中 <tag> 标签下的各类标签文本。
3. 将提取的标签用 ", " 连接起来，形成适用于目标模型的提示词。

用途、局限性与风险

主要用途

自动生成标签: 为动漫风格的图像描述生成 Danbooru 风格的标签。
生成图像描述: 基于标签或自然语言，生成或丰富图像描述。
文生图提示词工程: 将生成的标签作为提示词，输入到文生图模型中。

局限性与风险

重要提示：模型定位与使用责任

模型定位：纯粹的辅助工具
- 本模型的设计目标是作为一个纯粹的辅助工具，其核心功能是根据用户提供的特定指令和输入内容，进行忠实的文本转换与生成。
- 模型不会审查、评估、修改或引导用户的原始意图。它会如实处理输入数据，无论其内容如何。例如，模型不具备将 NSFW（Not Safe For Work）输入“修正”为 SFW（Safe For Work）输出的功能。其任务是忠实地扩展和重构输入，而非过滤或改变其性质。
使用责任与安全建议
- 用户责任： 使用者对输入内容负有全部责任。您必须确保您的输入符合相关法律法规和平台政策。
- 内容安全风险： 由于训练数据源自未经筛选的互联网内容，模型可能会生成包含 NSFW、冒犯性或不当信息的输出。本模型目前仍处在学习知识的阶段，故未经过任何特定的安全对齐或 RLHF (Reinforcement Learning from Human Feedback) 训练。
- 部署建议： 强烈建议不要将此模型直接用于任何面向公众的、未经额外安全措施处理的服务中。如果您计划在公开应用中使用本模型，您必须在模型前后部署自己的内容审查、安全过滤和风险控制机制，以确保最终输出的合规性与安全性。本模型本身不提供此类保障。
性能局限:
- 模型幻觉: 作为生成式语言模型，它可能会“幻想”出与输入不完全一致或不准确的细节。建议人工审核其输出，尤其是在处理对数字敏感（如 1girl vs 2girls）或视觉上相似的概念时。
- 表征偏见: 模型可能复现其训练数据中固有的风格、主题和角色分布偏见。
- 长尾数据表现: 对于数据集中出现频率较低（长尾）的标签或概念，模型的识别和生成能力可能相对较弱。
- 领域限制: 模型专注于动漫风格图像的文本处理，在处理非动漫或现实风格的图像描述时，性能会显著下降。

功能与任务

模型支持以下五种指令任务，所有输入和输出均需使用指定的 XML 格式包裹：

自然语言描述 → 标签 (<NLTOTAG>)
- 功能: 将一段自然语言的图像描述（Caption）转换为一组标签。
标签 → 自然语言描述 (<TAGTONL>)
- 功能: 将一组标签转换为一段详细、连贯的自然语言描述。
标签补全与优化 (<TAGTOTAG>)
- 功能: 对一组不完整的标签进行补全和优化。训练时通过对完整的标签集进行高、中、低强度的随机丢弃来模拟不完整的输入。
标签扩增 (<TAGDETAIL>)
- 功能: 将一组稀疏的核心标签（如 1girl、角色名等，少于10个）扩充为包含丰富细节的完整标签集（30个以上）。
短描述 → 长描述 (<SHORTTOLONG>)
- 功能: 将一段简短的图像描述扩写成更详细、内容更丰富的长描述。

如何使用

请使用特定的指令来引导模型执行相应任务。

1. 自然语言描述 → 标签 (`<NLTOTAG>`)

指令: <NLTOTAG>

输入:

<caption>This illustration features a young girl standing with a smile, looking directly at the viewer. She's wearing a white beret, and her long, light-colored hair is styled with pigtails, adorned with ribbons. She sports round glasses and a white collared shirt with long sleeves, layered with a light orange sweater vest. A pleated skirt that matches the ribbons, and thigh-highs complete her outfit. Her hands are clasped over her chest, possibly holding a book or other object. The background is plain white, emphasizing the character.</caption>

输出:

<tags><special>1girl</special><artists></artists><characters></characters><copyrights>original</copyrights><general>skirt, thighhighs, pleated_skirt, smile, white_background, hands_on_own_chest, looking_at_viewer, long_hair, glasses, solo, simple_background, long_sleeves, sketch</general><rating>safe</rating></tags>

2. 标签 → 自然语言描述 (`<TAGTONL>`)

指令: <TAGTONL>

输入:

<tags><special>2girls</special><artists></artists><characters></characters><copyrights></copyrights><general>bracelet, casual, purple_hair, oekaki, outdoors, white_flower, blouse, jewelry, pendant, drill_hair, dandelion, closed_eyes, smile, short_sleeves, frills</general><rating>safe</rating></tags>

输出:

<caption>The image is a vibrant, anime-style illustration of two young women, likely teens, lying together on a bed of green grass dotted with dandelions. The composition features a close-up, overhead perspective. The first girl has brown drill hair styled in two high twintails, secured with pink bows featuring black polka dots, and she's dressed in a polka-dotted dress with frilly details and short sleeves, with a pink heart pendant necklace. She's smiling widely, eyes closed, showing a blush on her cheeks, and is looking at the other girl. The second girl, also wearing a similar style dress with a pink bow, and has pink hair, looks back at her with a smile, hugging her. Both girls have bracelets. The overall atmosphere is bright and cheerful, suggesting a moment of friendship and intimacy in a sunny outdoor setting.</caption>

3. 标签补全与优化 (`<TAGTOTAG>`)

指令: <TAGTOTAG>

输入 (部分标签):

<tags><special>2girls</special><artists></artists><characters></characters><copyrights></copyrights><general>looking_at_another, hug, camisole, on_back, blush, oekaki, field, jewelry, dandelion_clock, on_grass, pendant, blouse, short_sleeves, dandelion, frills, bow, casual, smile, sleeveless, outdoors, brown_hair, pink_bow, hair_ribbon, polka_dot, shirt, short_hair, yellow_flower, lying, flower, closed_eyes, bracelet, drill_hair, sparkle, grass, on_side, purple_hair, ribbon, on_ground, white_flower</general><rating>safe</rating></tags>

输出 (补全后的标签):

<tags><special>2girls</special><artists></artists><characters></characters><copyrights></copyrights><general>closed_eyes, hair_ribbon, oekaki, sleeveless, sparkle, hug, pink_bow, white_flower, short_hair, looking_at_another, dandelion_clock, ribbon, pendant, flower, lying, purple_hair, bracelet, smile, bow, brown_hair, frills, blush, jewelry, short_sleeves, on_grass, casual, grass, outdoors, shirt, blouse, field, yellow_flower, camisole, on_back, twintails, polka_dot, on_ground, on_side, dandelion</general><rating>safe</rating></tags>

4. 标签扩增 (`<TAGDETAIL>`)

指令: <TAGDETAIL>

输入 (核心标签):

<tags><special>1girl</special><artists></artists><characters>hatsune_miku</characters><copyrights>vocaloid</copyrights><general></general><rating>safe</rating></tags>

输出 (扩增后标签):

<tags><special>1girl</special><artists></artists><characters>hatsune_miku</characters><copyrights>vocaloid</copyrights><general>solo, long_hair, twintails, blue_hair, looking_at_viewer, smile, aqua_hair, hair_ornament, aqua_eyes, shirt, sleeveless, collar, necktie, official_alternate_costume, bare_shoulders, pleated_skirt, black_skirt, thighhighs, detached_sleeves, headphones, microphone</general><rating>safe</rating></tags>

5. 短描述 → 长描述 (`<SHORTTOLONG>`)

指令: <SHORTTOLONG>

输入 (短描述):

<caption>A girl with blue pigtails.</caption>

输出 (长描述):

<caption>This illustration portrays a young woman, identified as Hatsune Miku from the Vocaloid series, characterized by her signature long, aqua-colored pigtails. She is depicted looking directly at the viewer with a friendly smile. Her outfit consists of a sleeveless grey top with a teal collar and tie, complemented by a black pleated skirt and thigh-high boots, which is her iconic attire. The simple background ensures that the focus remains entirely on the character.</caption>

Downloads last month: 4

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for SakikoLab/Qwen3-0.6B-Prompt-Gen-v0.5

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

(397)

this model