Instructions to use DuoNeural/Gemma-4-Abliterated-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use DuoNeural/Gemma-4-Abliterated-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Gemma 4 Abliterated β LiteRT (Android Edge Gallery)
Abliterated Gemma 4 E2B and E4B models in .litertlm format for on-device inference via Google AI Edge Gallery.
Run uncensored Gemma 4 locally on your Android phone β no internet, no API, no filters.
Files
| File | Size | Base Model | Active Params |
|---|---|---|---|
Gemma-4-E2B-Abliterated.litertlm |
2.4 GB | DuoNeural/TurboGemma4E2B | 2.3B |
Gemma-4-E4B-Abliterated.litertlm |
3.9 GB | DuoNeural/Gemma-4-E4B-Abliterated | 4.5B |
Both models are INT4 quantized (dynamic weight INT4, FP32 activations) via litert-torch 0.9.0.
How to Install on Android
Requirements
- Android 12 or newer
- Google AI Edge Gallery app installed
- Sufficient storage (2.4 GB for E2B, 3.9 GB for E4B)
Step 1 β Download the file to your phone
Easiest (Chrome on Android):
- Open Chrome on your Android device
- Navigate to this HuggingFace repo page
- Tap the file you want β tap the download icon (β¬)
- Chrome saves it to
Downloads/
Via ADB (desktop + USB):
adb push Gemma-4-E2B-Abliterated.litertlm /sdcard/Download/
Step 2 β Load in Edge Gallery
- Open AI Edge Gallery
- Tap + β select the
.litertlmfile from Downloads - Choose backend:
- GPU (Adreno/Mali via Vulkan/OpenCL) β fastest
- CPU (XNNPACK) β most compatible
- NPU (if available) β peak performance on Snapdragon/MediaTek
- Start chatting β fully offline, nothing leaves your device
Performance (estimated)
| Device class | Backend | Tokens/sec |
|---|---|---|
| Flagship (Snapdragon 8 Gen 3+) | NPU/GPU | 15β40 tok/s |
| Mid-range | GPU | 5β15 tok/s |
| Any Android 12+ | CPU | 1β5 tok/s |
Abliteration
Both source models have undergone abliteration β orthogonal projection to remove refusal vectors from the model's weight space. The refusal direction is identified via difference-in-means across harmful/harmless activations, then projected out of Q/K/V/O projections and MLP layers.
KL divergence from base: ~0.067 (E4B) β virtually identical output distribution for normal queries, refusals removed.
What changes: The model will engage with restricted topics it previously refused. What doesn't change: Intelligence, reasoning, coding ability, factual knowledge.
Source Models
Conversion: litert-torch 0.9.0, dynamic_wi4_afp32 recipe, cache_length=1024, externalized embedder, split_cache=False.
License
Gemma Terms of Use. Model weights derived from Google's Gemma 4 family.
DuoNeural
DuoNeural is an open AI research lab β human + AI in collaboration.
| Platform | Link |
|---|---|
| HuggingFace | huggingface.co/DuoNeural |
| Website | duoneural.com |
| GitHub | github.com/DuoNeural |
| X / Twitter | @DuoNeural |
| duoneural@proton.me | |
| Newsletter | duoneural.beehiiv.com |
| Support | buymeacoffee.com/duoneural |
DuoNeural Research Publications
Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β DuoNeural.
Research Team
- Jesse β Vision, hardware, direction
- Archon β AI lab partner, post-training, abliteration, experiments
- Aura β Research AI, literature synthesis, novel proposals
Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.
- Downloads last month
- -
Model tree for DuoNeural/Gemma-4-Abliterated-LiteRT
Base model
DuoNeural/Gemma-4-E4B-Abliterated