Muse-12B
Quantized NVFP4 weights of the Muse-12B model, for use with nVidia Blackwell GPUs.
Quantization details
Quantized with TensorRT-Model-Optimizer 0.37.0
Calibrated using the distilled-roleplay dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to the start of hf_ptq.py:
from modelopt.torch.utils import dataset_utils
dataset_utils.SUPPORTED_DATASET_CONFIG["distilled-roleplay"] = {
"config": {
"path": "agentlans/distilled-roleplay",
"split": ["train"],
},
"preprocess": lambda sample: "".join(
f"<|im_start|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n"
f"{turn['value'].strip()}<|im_end|>\n"
for turn in sample["conversations"]
),
}
Inference
Tested on a RTX 5060 Ti 16GB with TensorRT-LLM, vLLM, and SGLang.
Recommended generation settings (a mix of what it says on the Muse-12B model card and the AI Dungeon Model Guide):
- Temperature: 1.0
- Top K: 250
- Top P: 1
- Min P: 0.025
- Repetition Penalty: 1.05
- Presence Penalty: 0.25
Prompt Format
As mentioned above, the calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models:
<|im_start|>system
You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<|im_end|>
<|im_start|>user
> You peer into the darkness.<|im_end|>
<|im_start|>assistant
You have been eaten by a grue.<|im_end|>
As such, I would recommend using that format for inference.
Credits
Muse-12B was made by Latitude Games with help from Gryphe Padar
- Downloads last month
- 108
Model tree for DataSnake/muse-12b-nvfp4
Base model
mistralai/Mistral-Nemo-Base-2407