SuperGemma 4 E4B Abliterated — LiteRT-LM

⚠️ Unofficial build. This LiteRT-LM package is not published by Google, the LiteRT team, or Jiunsong — it is a community conversion. Because of int4/int8 re-quantisation during packaging, outputs may differ from the source Jiunsong/supergemma4-e4b-abliterated checkpoint and, in particular, the abliterated behaviour may be attenuated on some prompts. For reference behaviour, run the source checkpoint directly (e.g. via transformers) or use the MLX 4-bit companion build on Apple Silicon.

On-device build of Jiunsong/supergemma4-e4b-abliterated in the .litertlm format for the LiteRT-LM runtime (Android, iOS, macOS, Linux, Windows, Web).

Companion Apple-Silicon-targeted builds for this fine-tune:

Target	Repo
LiteRT-LM (this repo) — CPU + Metal GPU via LiteRT-LM CLI / Android / iOS / Desktop	`litert-lm` format, 3.65 GB
MLX 4-bit	MLX safetensors, Mac Studio class local serving

The LiteRT-LM bundle is the path for cross-platform on-device deployment (Android, iOS, Desktop, Web — all through the same file). The MLX build above is the fastest option for local serving on macOS specifically.

Base models

google/gemma-4-E4B-it — the original Gemma 4 E4B instruct checkpoint.
Jiunsong/supergemma4-e4b-abliterated — the abliterated fine-tune. Abliteration removes the refusal direction from the residual stream; see the source card for release behaviour notes.

Run it

CLI (fastest way to try it)

uv tool install litert-lm  # one-time

# Run directly from this HF repo, on Apple Silicon Metal GPU:
litert-lm run --from-huggingface-repo typomonster/supergemma4-e4b-abliterated-litert-lm \
              supergemma4-e4b-abliterated.litertlm \
              --prompt "Write a three-sentence story about a robot who discovers music." \
              --backend gpu \
              --enable-speculative-decoding=false

--backend gpu routes through libLiteRtMetalAccelerator on macOS ARM64; measured ~40 tok/s decode on Metal vs ~13 tok/s on CPU (M-series, your mileage will vary).

--enable-speculative-decoding=false is recommended — see the caveats.

Platform SDKs

Android / iOS / Desktop: see the LiteRT-LM documentation for platform integration guides.
Python: see python/litert_lm/examples/simple_main.py in the LiteRT-LM repo.

Caveats

Soft-refusal behaviour may be attenuated vs. the source HF fine-tune. If you need the strongest abliterated behaviour on Apple Silicon, use the MLX build linked above.
Run with --enable-speculative-decoding=false. The MTP drafter in this bundle may have a reduced accept rate against the modified main model. Speculative decoding remains correct (the main model verifies each proposed token) but is not guaranteed to be a speedup here.

License

This build inherits the Gemma license from its upstream bases. Review the terms there before redistribution.

Acknowledgements

Upstream model: google/gemma-4-E4B-it (Google DeepMind).
Abliteration fine-tune: Jiunsong/supergemma4-e4b-abliterated.

Downloads last month: 637

Model tree for typomonster/supergemma4-e4b-abliterated-litert-lm

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

Jiunsong/supergemma4-e4b-abliterated

Finetuned

(2)

this model