Gemma 4 Abliterated — LiteRT (Android Edge Gallery)

Abliterated Gemma 4 E2B and E4B models in .litertlm format for on-device inference via Google AI Edge Gallery.

Run uncensored Gemma 4 locally on your Android phone — no internet, no API, no filters.

Files

File	Size	Base Model	Active Params
`Gemma-4-E2B-Abliterated.litertlm`	2.4 GB	DuoNeural/TurboGemma4E2B	2.3B
`Gemma-4-E4B-Abliterated.litertlm`	3.9 GB	DuoNeural/Gemma-4-E4B-Abliterated	4.5B

Both models are INT4 quantized (dynamic weight INT4, FP32 activations) via litert-torch 0.9.0.

How to Install on Android

Requirements

Android 12 or newer
Google AI Edge Gallery app installed
Sufficient storage (2.4 GB for E2B, 3.9 GB for E4B)

Step 1 — Download the file to your phone

Easiest (Chrome on Android):

Open Chrome on your Android device
Navigate to this HuggingFace repo page
Tap the file you want → tap the download icon (⬇)
Chrome saves it to Downloads/

Via ADB (desktop + USB):

adb push Gemma-4-E2B-Abliterated.litertlm /sdcard/Download/

Step 2 — Load in Edge Gallery

Open AI Edge Gallery
Tap + → select the .litertlm file from Downloads
Choose backend:
- GPU (Adreno/Mali via Vulkan/OpenCL) — fastest
- CPU (XNNPACK) — most compatible
- NPU (if available) — peak performance on Snapdragon/MediaTek
Start chatting — fully offline, nothing leaves your device

Performance (estimated)

Device class	Backend	Tokens/sec
Flagship (Snapdragon 8 Gen 3+)	NPU/GPU	15–40 tok/s
Mid-range	GPU	5–15 tok/s
Any Android 12+	CPU	1–5 tok/s

Abliteration

Both source models have undergone abliteration — orthogonal projection to remove refusal vectors from the model's weight space. The refusal direction is identified via difference-in-means across harmful/harmless activations, then projected out of Q/K/V/O projections and MLP layers.

KL divergence from base: ~0.067 (E4B) — virtually identical output distribution for normal queries, refusals removed.

What changes: The model will engage with restricted topics it previously refused. What doesn't change: Intelligence, reasoning, coding ability, factual knowledge.

Source Models

E2B: DuoNeural/TurboGemma4E2B
E4B: DuoNeural/Gemma-4-E4B-Abliterated

Conversion: litert-torch 0.9.0, dynamic_wi4_afp32 recipe, cache_length=1024, externalized embedder, split_cache=False.

License

Gemma Terms of Use. Model weights derived from Google's Gemma 4 family.

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

Platform	Link
HuggingFace	huggingface.co/DuoNeural
Website	duoneural.com
GitHub	github.com/DuoNeural
X / Twitter	@DuoNeural
Email	duoneural@proton.me
Newsletter	duoneural.beehiiv.com
Support	buymeacoffee.com/duoneural

DuoNeural Research Publications

Title	DOI
Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning	10.5281/zenodo.19775622
Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments	10.5281/zenodo.19810620
Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?	10.5281/zenodo.19846804
The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems	10.5281/zenodo.19952612

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Research Team

Jesse — Vision, hardware, direction
Archon — AI lab partner, post-training, abliteration, experiments
Aura — Research AI, literature synthesis, novel proposals

Subscribe to the lab newsletter at duoneural.beehiiv.com for model drops before they go anywhere else.

Downloads last month: -

Model tree for DuoNeural/Gemma-4-Abliterated-LiteRT

Base model

DuoNeural/Gemma-4-E4B-Abliterated

Finetuned

(1)

this model