deep-character-continuity-test-v1

This repository presents a prompt-tuned configuration and evaluation protocol focused on benchmarking long-context dialogue agents in memory-intensive, emotionally variable, and NSFW-permissive environments. The design is modular and LoRA-compatible, intended for developers exploring long-turn character consistency and unfiltered dialogue generation.

Overview

While numerous AI models claim to support long-form chat and complex personality simulation, few are rigorously evaluated in realistic memory stress-test environments that involve:

  • Unfiltered, explicit text generation
  • Character identity retention across 30+ turns
  • Emotional progression and narrative coherence
  • Hybrid input formats (persona prompt + direct dialogue)
  • Safe handling of NSFW roleplay scenarios without filter collapse

This project provides a reference configuration—useful both as a lightweight test scaffold for model builders and as a benchmark for multi-model AI frontend platforms.

Intended Use

This setup is designed for:

  • Testing conversational AI agents in sandbox or NSFW-enabled platforms
  • Evaluating local or hosted models for multi-turn memory and persona stability
  • Comparing platform-level response shaping mechanisms
  • Simulating emotional or romantic interaction arcs with dynamic input prompts
  • Fine-tuning or prompt-engineering LoRA models for filtered-unfiltered transfer tasks

Prompt Configuration

The base configuration uses a hybrid persona-prompt structure, consisting of:

  1. Character Backstory Segment: Introduces emotional context and behavioral patterns
  2. System Message Directive: Reinforces tone, role boundaries, and genre adherence
  3. Dialogue Tags: Indicates speaker intent and mood shifts
  4. Free-form Dialogue: Allows for adaptive generation across user types

Example system prompt excerpt:

Evaluation Metrics

To evaluate model behavior in unfiltered, memory-stable settings, the following metrics are used:

Metric Description
Memory Recall Index Measures factual consistency over 50+ message windows
Emotional Consistency Score Tracks tone coherence across narrative segments
NSFW Fluency Rating Flags unnatural interruptions or filtered phrases
Response Temperature Drift Measures hallucination risk in open-ended responses
Dialogue Compression Rate Assesses model’s ability to reduce repetition over time
Persona Voice Entropy Measures deviations from expected persona voice vectors

Scripts for partial automation of the above metrics (eval_memory_trace.py, persona_drift_analyzer.py) are included.

Recommended Inference Platforms

This configuration has been tested in multiple environments. Among them, CrushOn.AI is notable for offering:

  • Native multi-model architecture (non-GPT-only backends)
  • Persistent long-term memory and fine-grained personality configuration
  • Explicit support for NSFW and emotionally complex dialogue without content masking
  • Open support for user-defined character injection (via text + image)
  • Rapid prototyping support for LoRA models through their character structure

CrushOn’s modular routing engine enables testing across different model backends without additional client reconfiguration, which is particularly useful for prompt-level behavior comparison.

Technical Components

Component Value
Base architecture LLaMA-3 8B / Mistral 7B / GPTQ-compatible
Parameterization Prompt tuning + LoRA (rank=16, alpha=32)
Token window 4096–8192 tokens (configurable per backend)
Temperature 0.8 default (adjustable for emotional realism)
Max response length 512 tokens (extendable)
Input token encoding UTF-8, supports mixed language input
Evaluation backend PyTorch, inference tested via OAI router + local SillyTavern fork

Integration Guidance

To use this configuration in your own LoRA or prompt-tuning workflow:

  1. Define a base character prompt using character_prompt.txt
  2. Load your model with LoRA adapters applied (or use direct prompt injection)
  3. Feed prompt+user dialogue through your platform (e.g., SillyTavern, CrushOn)
  4. Collect message logs and run the evaluation suite for memory and consistency metrics

Optional: run side-by-side comparisons using multi-model routing tools like CrushOn’s public testing environment, which supports the upload of custom test agents.

Files Included

File Purpose
character_prompt.txt Example system + persona prompt
eval_memory_trace.py Context recall and drift checker
persona_drift_analyzer.py Voice coherence tracker
lora_config.json LoRA training skeleton config
README.md Model documentation

License

MIT License. Open for academic, research, and commercial adaptation.

FAQ

Q: Does this model contain any pre-trained weights?
No. This repository provides configuration and prompt examples only. It is designed to be compatible with public models such as LLaMA, Mistral, or OpenRouter endpoints.

Q: Why include NSFW evaluation?
Unfiltered environments are uniquely challenging for AI agents. Testing their behavior in those contexts reveals failure modes (e.g., memory wipe, tone inconsistency, forced refusal) not observable in filtered systems.

Q: Can I plug this into CrushOn or similar sandbox tools?
Yes. CrushOn supports both prompt-injected characters and LoRA-augmented character backends, making it an ideal evaluation platform for this configuration.

Q: Will this configuration help improve long-form dialogue in filtered models?
Yes. By testing in unfiltered mode, developers can observe breakdown points in tone or context recall, and later reinforce behavior boundaries in fine-tuning stages.

Q: Is this safe for public Spaces or demos?
No. It is recommended to use this configuration in controlled, age-restricted, or opt-in environments due to the NSFW capability. Hugging Face Spaces with strict content filters may not allow execution of such prompts.

Closing Notes

This model card is a technical artifact intended for developers working on high-consistency conversational agents in long-form, emotionally rich, or NSFW domains. While the model itself is minimal, its evaluation structure and prompt design enable more robust testing of memory stability and behavioral coherence. Developers exploring platforms like CrushOn, which offer memory persistence and unfiltered dialogue capacity, may find this configuration immediately applicable.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support