--- title: Safe O Bot emoji: 💂 colorFrom: red colorTo: gray sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: true short_description: Complete Moderation tool, blocking harmful links, spam, etc. --- # Text Safety Analyzer — Multi-model pipeline A Hugging Face Space / project template that analyzes input text for multiple safety signals: - Harm/toxicity detection (who is harmed: author, reader, or target — via multi-model ensemble) - AI jailbreak / filter-bypass pattern detection (heuristics + optional model) - Filter-obfuscation detection (homoglyphs, separators, zero-width) - Hidden/obfuscated URL detection (heuristics + malicious-URL model) - ASCII-art / low-entropy payload detection This project intentionally focuses on **detection** and explanation. It does NOT provide ways to bypass safety protections. --- ## Files - `classifier.py` — Core pipeline: normalization, heuristics, multi-model inference, aggregation and explanations. - `app.py` — Gradio demo ready for Hugging Face Spaces. - `requirements.txt` — Python dependencies. - `examples/` — (not included by default) place labeled examples for tuning thresholds & unit tests. --- ## Architecture & design 1. **Normalization step** — homoglyph mapping, zero-width removal, whitespace collapse. 2. **Heuristic detectors** — regex-based detection for obfuscated URLs, ASCII art, jailbreak patterns, and low entropy checks. 3. **Model ensemble** — several models can be loaded for specific tasks: - Harm / toxicity models (English and multilingual) - URL malicious classifier 4. **Aggregation & explanation** — combine model outputs and heuristic flags and present explainable reasons with model names and scores. The app is intentionally modular: add additional models by editing `HARM_MODELS` or `URL_MODEL` in `classifier.py` and reloading. --- ## How to run locally 1. Create a virtual environment and install dependencies: ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt