Muse-12B

Quantized NVFP4 weights of the Muse-12B model, for use with nVidia Blackwell GPUs.

Quantization details

Quantized with TensorRT-Model-Optimizer 0.37.0

Calibrated using the distilled-roleplay dataset, tagged in the same ChatML format used to train the Wayfarer and Muse models in the first place. This was accomplished by adding the following code to the start of hf_ptq.py:

from modelopt.torch.utils import dataset_utils

dataset_utils.SUPPORTED_DATASET_CONFIG["distilled-roleplay"] = {
    "config": {
        "path": "agentlans/distilled-roleplay",
        "split": ["train"],
    },
    "preprocess": lambda sample: "".join(
        f"<|im_start|>{ {'system':'system','human':'user','gpt':'assistant'}[turn['from']] }\n"
        f"{turn['value'].strip()}<|im_end|>\n"
        for turn in sample["conversations"]
    ),
}

Inference

Tested on a RTX 5060 Ti 16GB with TensorRT-LLM, vLLM, and SGLang.

Recommended generation settings (a mix of what it says on the Muse-12B model card and the AI Dungeon Model Guide):

Temperature: 1.0
Top K: 250
Top P: 1
Min P: 0.025
Repetition Penalty: 1.05
Presence Penalty: 0.25

Prompt Format

As mentioned above, the calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models:

<|im_start|>system
You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<|im_end|>
<|im_start|>user
> You peer into the darkness.<|im_end|>
<|im_start|>assistant
You have been eaten by a grue.<|im_end|>

As such, I would recommend using that format for inference.

Credits

Muse-12B was made by Latitude Games with help from Gryphe Padar

Downloads last month: 108

Safetensors

Model size

7B params

Tensor type

BF16

F8_E4M3

Model tree for DataSnake/muse-12b-nvfp4

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

LatitudeGames/Muse-12B

Quantized

(15)

this model

DataSnake
/

muse-12b-nvfp4