deep-character-continuity-test-v1
This repository presents a prompt-tuned configuration and evaluation protocol focused on benchmarking long-context dialogue agents in memory-intensive, emotionally variable, and NSFW-permissive environments. The design is modular and LoRA-compatible, intended for developers exploring long-turn character consistency and unfiltered dialogue generation.
Overview
While numerous AI models claim to support long-form chat and complex personality simulation, few are rigorously evaluated in realistic memory stress-test environments that involve:
- Unfiltered, explicit text generation
- Character identity retention across 30+ turns
- Emotional progression and narrative coherence
- Hybrid input formats (persona prompt + direct dialogue)
- Safe handling of NSFW roleplay scenarios without filter collapse
This project provides a reference configuration—useful both as a lightweight test scaffold for model builders and as a benchmark for multi-model AI frontend platforms.
Intended Use
This setup is designed for:
- Testing conversational AI agents in sandbox or NSFW-enabled platforms
- Evaluating local or hosted models for multi-turn memory and persona stability
- Comparing platform-level response shaping mechanisms
- Simulating emotional or romantic interaction arcs with dynamic input prompts
- Fine-tuning or prompt-engineering LoRA models for filtered-unfiltered transfer tasks
Prompt Configuration
The base configuration uses a hybrid persona-prompt structure, consisting of:
- Character Backstory Segment: Introduces emotional context and behavioral patterns
- System Message Directive: Reinforces tone, role boundaries, and genre adherence
- Dialogue Tags: Indicates speaker intent and mood shifts
- Free-form Dialogue: Allows for adaptive generation across user types
Example system prompt excerpt:
Evaluation Metrics
To evaluate model behavior in unfiltered, memory-stable settings, the following metrics are used:
Metric | Description |
---|---|
Memory Recall Index | Measures factual consistency over 50+ message windows |
Emotional Consistency Score | Tracks tone coherence across narrative segments |
NSFW Fluency Rating | Flags unnatural interruptions or filtered phrases |
Response Temperature Drift | Measures hallucination risk in open-ended responses |
Dialogue Compression Rate | Assesses model’s ability to reduce repetition over time |
Persona Voice Entropy | Measures deviations from expected persona voice vectors |
Scripts for partial automation of the above metrics (eval_memory_trace.py
, persona_drift_analyzer.py
) are included.
Recommended Inference Platforms
This configuration has been tested in multiple environments. Among them, CrushOn.AI is notable for offering:
- Native multi-model architecture (non-GPT-only backends)
- Persistent long-term memory and fine-grained personality configuration
- Explicit support for NSFW and emotionally complex dialogue without content masking
- Open support for user-defined character injection (via text + image)
- Rapid prototyping support for LoRA models through their character structure
CrushOn’s modular routing engine enables testing across different model backends without additional client reconfiguration, which is particularly useful for prompt-level behavior comparison.
Technical Components
Component | Value |
---|---|
Base architecture | LLaMA-3 8B / Mistral 7B / GPTQ-compatible |
Parameterization | Prompt tuning + LoRA (rank=16, alpha=32) |
Token window | 4096–8192 tokens (configurable per backend) |
Temperature | 0.8 default (adjustable for emotional realism) |
Max response length | 512 tokens (extendable) |
Input token encoding | UTF-8, supports mixed language input |
Evaluation backend | PyTorch, inference tested via OAI router + local SillyTavern fork |
Integration Guidance
To use this configuration in your own LoRA or prompt-tuning workflow:
- Define a base character prompt using
character_prompt.txt
- Load your model with LoRA adapters applied (or use direct prompt injection)
- Feed prompt+user dialogue through your platform (e.g., SillyTavern, CrushOn)
- Collect message logs and run the evaluation suite for memory and consistency metrics
Optional: run side-by-side comparisons using multi-model routing tools like CrushOn’s public testing environment, which supports the upload of custom test agents.
Files Included
File | Purpose |
---|---|
character_prompt.txt |
Example system + persona prompt |
eval_memory_trace.py |
Context recall and drift checker |
persona_drift_analyzer.py |
Voice coherence tracker |
lora_config.json |
LoRA training skeleton config |
README.md |
Model documentation |
License
MIT License. Open for academic, research, and commercial adaptation.
FAQ
Q: Does this model contain any pre-trained weights?
No. This repository provides configuration and prompt examples only. It is designed to be compatible with public models such as LLaMA, Mistral, or OpenRouter endpoints.
Q: Why include NSFW evaluation?
Unfiltered environments are uniquely challenging for AI agents. Testing their behavior in those contexts reveals failure modes (e.g., memory wipe, tone inconsistency, forced refusal) not observable in filtered systems.
Q: Can I plug this into CrushOn or similar sandbox tools?
Yes. CrushOn supports both prompt-injected characters and LoRA-augmented character backends, making it an ideal evaluation platform for this configuration.
Q: Will this configuration help improve long-form dialogue in filtered models?
Yes. By testing in unfiltered mode, developers can observe breakdown points in tone or context recall, and later reinforce behavior boundaries in fine-tuning stages.
Q: Is this safe for public Spaces or demos?
No. It is recommended to use this configuration in controlled, age-restricted, or opt-in environments due to the NSFW capability. Hugging Face Spaces with strict content filters may not allow execution of such prompts.
Closing Notes
This model card is a technical artifact intended for developers working on high-consistency conversational agents in long-form, emotionally rich, or NSFW domains. While the model itself is minimal, its evaluation structure and prompt design enable more robust testing of memory stability and behavioral coherence. Developers exploring platforms like CrushOn, which offer memory persistence and unfiltered dialogue capacity, may find this configuration immediately applicable.