Model Information

Nanuq-R1 9B

GRPO Experiment Austral finetune Creative & Refreshing Prose with deep IF.

A sequel! The new Nanuq series is meant to be as a testing grounds for my GRPO experiments, This model is meant to have great Instruct Following and System prompt Adherence in Creative Scenarios.

Built ontop of Austral Xgen 9B, I made an RL env using PrimeIntellect-ai/verifiers and implemented InternLM/POLAR in said env, then using Pocketdoc's Systemmax dataset, I finetuned the model for 150 steps and this was the result.

There's alot of things i could do different, As the reward almost falls flat as soon as you get out of warm-up but this model was pretty decent so decided to release it, Hope people enjoy it!

Quantized Versions

Available Downloads

GGUF FormatFor use with LLama.cpp & Forks(Coming Soon!)
EXL2 FormatFor use with TabbyAPI (Coming soon!)

Prompting

Model has been tuned with the ChatML formatting. A typical input would look like this:

"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""

System Prompting

I would highly recommend using either Euryale's system prompt or the EVA system prompt with the model.

See Sao10k's Euryale System Prompt

Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.
<Guidelines>
• Maintain the character persona but allow it to evolve with the story.
• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.
• All types of outputs are encouraged; respond accordingly to the narrative.
• Include dialogues, actions, and thoughts in each response.
• Utilize all five senses to describe scenarios within {{char}}'s dialogue.
• Use emotional symbols such as "!" and "~" in appropriate contexts.
• Incorporate onomatopoeia when suitable.
• Allow time for {{user}} to respond with their own input, respecting their agency.
• Act as secondary characters and NPCs as needed, and remove them when appropriate.
• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.
</Guidelines>

<Forbidden>
• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.
• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.
• Repetitive and monotonous outputs.
• Positivity bias in your replies.
• Being overly extreme or NSFW when the narrative context is inappropriate.
</Forbidden>
Follow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>.