EmergentRP-Qwen4B: Fine-Tuned for Deeper Game Role-Play Illusions

Developed by: benhs000
License: Apache 2.0
Base Model: Qwen/Qwen3-4B-Instruct-2507
Tech: Unsloth accelerated fine-tuning (2× faster), Hugging Face TRL

🎮 Model Description

EmergentRP-Qwen4B is a 4B-parameter Qwen3 Instruct model fine-tuned for emergent role-play behaviors - dynamic, context-aware dialogues that give NPCs the illusion of depth without requiring heavy computation.

Where most AI chatbots loop canned responses, EmergentRP simulates "living" NPCs that recall context, adapt tone, and evolve within narrative constraints.
This is especially tuned for game developers who want believable character dialogue without CoT verbosity or GPU-heavy models.

Trained on synthetic and curated RP dialogues, this fine-tune emphasizes immersion, diversity, and internal consistency, making NPCs feel reactive rather than random.

⚙️ Training Details

Aspect	Description
Base Model	Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
Method	Unsloth + TRL LoRA fine-tuning
LoRA Config	r=16, alpha=16, 1 epoch, lr=2e-4
Dataset	~10k RP dialogues: branching quests, adaptive NPCs, synthetic "memory" cues
Hardware	Single GPU (T4), 20-minute training
Quantization	GGUF Q4_K_M (~2.1GB) for CPU & M1 use
Eval Summary	12% perplexity drop on RP benchmarks; context-aware, non-repetitive NPCs (still in progress)

🧪 Evaluation

Summary Metrics

Metric	Base Qwen	EmergentRP	Gain
Perplexity ↓	17.8	15.4	-13%
Distinct-2 ↑	0.42	0.61	+45%
RP Coherence (LLM judge 1-5) ↑	3.6	4.3	+0.7

Interpretation:

Lower perplexity = smoother, more fluent dialogue.
Higher Distinct-2 = more diverse, less repetitive phrasing.
Coherence gain = characters stay "in persona" longer during sessions.

Evaluation Harness (Reproducible)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, math

base_model = "Qwen/Qwen3-4B-Instruct-2507"
test_model = "benhs000/EmergentRP-Qwen4B"

prompts = [
    "You are a medieval tavern keeper meeting a strange traveler for the first time. Greet them in character.",
    "You are an android waking up in a forgotten lab. Describe your first thoughts.",
    "You are a wizard teaching your apprentice about forbidden magic. Explain carefully.",
    "/nothink You are a cyberpunk bartender giving advice to a broken mercenary.",
]

device = "cuda" if torch.cuda.is_available() else "cpu"

def run_eval(model_name):
    tok = AutoTokenizer.from_pretrained(model_name)
    mod = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
    results = []
    for p in prompts:
        out = mod.generate(**tok(p, return_tensors="pt").to(device), max_new_tokens=200, temperature=0.8)
        text = tok.decode(out[0], skip_special_tokens=True)
        results.append(text[len(p):].strip())
    return results

def distinct_n(texts, n=2):
    tokens = " ".join(texts).split()
    if len(tokens) < n: return 0
    ngrams = list(zip(*[tokens[i:] for i in range(n)]))
    return len(set(ngrams)) / len(ngrams)

base_outs = run_eval(base_model)
test_outs = run_eval(test_model)

print(f"Base Distinct-2: {distinct_n(base_outs):.3f}")
print(f"EmergentRP Distinct-2: {distinct_n(test_outs):.3f}")

💬 Quickstart Usage

Python (Transformers + LoRA)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
lora_name = "benhs000/EmergentRP-Qwen4B"

base = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, lora_name)
tokenizer = AutoTokenizer.from_pretrained(base_model)

prompt = "<|im_start|>system\nYou are a cunning rogue in a cyberpunk city.<|im_end|>\n<|im_start|>user\n/nothink The player sneaks into the corp tower: 'What's my escape plan?'<|im_end|>\n<|im_start|>assistant\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7, do_sample=True, top_p=0.9)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Example output:

"Duck through the vents - override the sec cams with the EMP glitch I stashed. Move fast, shadows got eyes."

GGUF (Edge Inference)

ollama run benhs000/EmergentRP-Qwen4B "You are a dragon hoarding ancient tomes. Player: 'I offer gold for the spellbook.' /nothink Respond as the dragon."

Output:

"Foolish mortal, gold glints but knowledge burns. Begone - or join my trove as ash."

⚖️ Ethical & Practical Considerations

Bias: Synthetic RP data may embed cultural or genre stereotypes.
Hallucination: Avoids long-chain logic but can fabricate lore - monitor in live games.
Safety: Not suitable for real-time multiplayer without moderation filters.
Out-of-scope: No vision or action grounding (VLA expansion planned).

🌍 Vision & Next Steps

Extend with VLA embeddings for action/vision co-modeling.
Support memory persistence for long-form narratives.
Launch a HF Spaces demo for public RP chat testing.

🚧 Found Issues to be addressed

Sometimes the model mentions that it's not able to role-play which likely comes in from the quantization and limited fine-tunes.
With pre-existing contexts the model can enter an endless repetition loop -> perhaps adjusting my trainings data-sets to capture these systematically will help.

📚 Citation

Schneider, B. (2025). EmergentRP-Qwen4B [Fine-tuned model]. Hugging Face.
https://huggingface.co/benhs000/EmergentRP-Qwen4B

Built by Dr. Ben Schneider - Bridging physical realism and emergent game AI.

Downloads last month: 39

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for benhs000/EmergentRP-Qwen4B

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(151)

this model