EmergentRP-Qwen4B: Fine-Tuned for Deeper Game Role-Play Illusions

Developed by: benhs000
License: Apache 2.0
Base Model: Qwen/Qwen3-4B-Instruct-2507
Tech: Unsloth accelerated fine-tuning (2ร— faster), Hugging Face TRL

Hugging Face Downloads Apache 2.0


๐ŸŽฎ Model Description

EmergentRP-Qwen4B is a 4B-parameter Qwen3 Instruct model fine-tuned for emergent role-play behaviors - dynamic, context-aware dialogues that give NPCs the illusion of depth without requiring heavy computation.

Where most AI chatbots loop canned responses, EmergentRP simulates "living" NPCs that recall context, adapt tone, and evolve within narrative constraints.
This is especially tuned for game developers who want believable character dialogue without CoT verbosity or GPU-heavy models.

Trained on synthetic and curated RP dialogues, this fine-tune emphasizes immersion, diversity, and internal consistency, making NPCs feel reactive rather than random.


โš™๏ธ Training Details

Aspect Description
Base Model Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
Method Unsloth + TRL LoRA fine-tuning
LoRA Config r=16, alpha=16, 1 epoch, lr=2e-4
Dataset ~10k RP dialogues: branching quests, adaptive NPCs, synthetic "memory" cues
Hardware Single GPU (T4), 20-minute training
Quantization GGUF Q4_K_M (~2.1GB) for CPU & M1 use
Eval Summary 12% perplexity drop on RP benchmarks; context-aware, non-repetitive NPCs (still in progress)

๐Ÿงช Evaluation

Summary Metrics

Metric Base Qwen EmergentRP Gain
Perplexity โ†“ 17.8 15.4 -13%
Distinct-2 โ†‘ 0.42 0.61 +45%
RP Coherence (LLM judge 1-5) โ†‘ 3.6 4.3 +0.7

Interpretation:

  • Lower perplexity = smoother, more fluent dialogue.
  • Higher Distinct-2 = more diverse, less repetitive phrasing.
  • Coherence gain = characters stay "in persona" longer during sessions.

Evaluation Harness (Reproducible)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch, math

base_model = "Qwen/Qwen3-4B-Instruct-2507"
test_model = "benhs000/EmergentRP-Qwen4B"

prompts = [
    "You are a medieval tavern keeper meeting a strange traveler for the first time. Greet them in character.",
    "You are an android waking up in a forgotten lab. Describe your first thoughts.",
    "You are a wizard teaching your apprentice about forbidden magic. Explain carefully.",
    "/nothink You are a cyberpunk bartender giving advice to a broken mercenary.",
]

device = "cuda" if torch.cuda.is_available() else "cpu"

def run_eval(model_name):
    tok = AutoTokenizer.from_pretrained(model_name)
    mod = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
    results = []
    for p in prompts:
        out = mod.generate(**tok(p, return_tensors="pt").to(device), max_new_tokens=200, temperature=0.8)
        text = tok.decode(out[0], skip_special_tokens=True)
        results.append(text[len(p):].strip())
    return results

def distinct_n(texts, n=2):
    tokens = " ".join(texts).split()
    if len(tokens) < n: return 0
    ngrams = list(zip(*[tokens[i:] for i in range(n)]))
    return len(set(ngrams)) / len(ngrams)

base_outs = run_eval(base_model)
test_outs = run_eval(test_model)

print(f"Base Distinct-2: {distinct_n(base_outs):.3f}")
print(f"EmergentRP Distinct-2: {distinct_n(test_outs):.3f}")

๐Ÿ’ฌ Quickstart Usage

Python (Transformers + LoRA)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen3-4B-Instruct-2507"
lora_name = "benhs000/EmergentRP-Qwen4B"

base = AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base, lora_name)
tokenizer = AutoTokenizer.from_pretrained(base_model)

prompt = "<|im_start|>system\nYou are a cunning rogue in a cyberpunk city.<|im_end|>\n<|im_start|>user\n/nothink The player sneaks into the corp tower: 'What's my escape plan?'<|im_end|>\n<|im_start|>assistant\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7, do_sample=True, top_p=0.9)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Example output:

"Duck through the vents - override the sec cams with the EMP glitch I stashed. Move fast, shadows got eyes."


GGUF (Edge Inference)

ollama run benhs000/EmergentRP-Qwen4B "You are a dragon hoarding ancient tomes. Player: 'I offer gold for the spellbook.' /nothink Respond as the dragon."

Output:

"Foolish mortal, gold glints but knowledge burns. Begone - or join my trove as ash."


โš–๏ธ Ethical & Practical Considerations

  • Bias: Synthetic RP data may embed cultural or genre stereotypes.
  • Hallucination: Avoids long-chain logic but can fabricate lore - monitor in live games.
  • Safety: Not suitable for real-time multiplayer without moderation filters.
  • Out-of-scope: No vision or action grounding (VLA expansion planned).

๐ŸŒ Vision & Next Steps

  • Extend with VLA embeddings for action/vision co-modeling.
  • Support memory persistence for long-form narratives.
  • Launch a HF Spaces demo for public RP chat testing.

๐Ÿšง Found Issues to be addressed

  • Sometimes the model mentions that it's not able to role-play which likely comes in from the quantization and limited fine-tunes.
  • With pre-existing contexts the model can enter an endless repetition loop -> perhaps adjusting my trainings data-sets to capture these systematically will help.

๐Ÿ“š Citation

Schneider, B. (2025). EmergentRP-Qwen4B [Fine-tuned model]. Hugging Face.
https://huggingface.co/benhs000/EmergentRP-Qwen4B


Built by Dr. Ben Schneider - Bridging physical realism and emergent game AI.

Downloads last month
39
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for benhs000/EmergentRP-Qwen4B

Quantized
(151)
this model