SuperGemma 4 E4B Abliterated β€” LiteRT-LM

⚠️ Unofficial build. This LiteRT-LM package is not published by Google, the LiteRT team, or Jiunsong β€” it is a community conversion. Because of int4/int8 re-quantisation during packaging, outputs may differ from the source Jiunsong/supergemma4-e4b-abliterated checkpoint and, in particular, the abliterated behaviour may be attenuated on some prompts. For reference behaviour, run the source checkpoint directly (e.g. via transformers) or use the MLX 4-bit companion build on Apple Silicon.

On-device build of Jiunsong/supergemma4-e4b-abliterated in the .litertlm format for the LiteRT-LM runtime (Android, iOS, macOS, Linux, Windows, Web).

Companion Apple-Silicon-targeted builds for this fine-tune:

Target Repo
LiteRT-LM (this repo) β€” CPU + Metal GPU via LiteRT-LM CLI / Android / iOS / Desktop litert-lm format, 3.65 GB
MLX 4-bit MLX safetensors, Mac Studio class local serving

The LiteRT-LM bundle is the path for cross-platform on-device deployment (Android, iOS, Desktop, Web β€” all through the same file). The MLX build above is the fastest option for local serving on macOS specifically.

Base models

Run it

CLI (fastest way to try it)

uv tool install litert-lm  # one-time

# Run directly from this HF repo, on Apple Silicon Metal GPU:
litert-lm run --from-huggingface-repo typomonster/supergemma4-e4b-abliterated-litert-lm \
              supergemma4-e4b-abliterated.litertlm \
              --prompt "Write a three-sentence story about a robot who discovers music." \
              --backend gpu \
              --enable-speculative-decoding=false

--backend gpu routes through libLiteRtMetalAccelerator on macOS ARM64; measured ~40 tok/s decode on Metal vs ~13 tok/s on CPU (M-series, your mileage will vary).

--enable-speculative-decoding=false is recommended β€” see the caveats.

Platform SDKs

Caveats

  1. Soft-refusal behaviour may be attenuated vs. the source HF fine-tune. If you need the strongest abliterated behaviour on Apple Silicon, use the MLX build linked above.

  2. Run with --enable-speculative-decoding=false. The MTP drafter in this bundle may have a reduced accept rate against the modified main model. Speculative decoding remains correct (the main model verifies each proposed token) but is not guaranteed to be a speedup here.

License

This build inherits the Gemma license from its upstream bases. Review the terms there before redistribution.

Acknowledgements

  • Upstream model: google/gemma-4-E4B-it (Google DeepMind).
  • Abliteration fine-tune: Jiunsong/supergemma4-e4b-abliterated.
Downloads last month
637
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for typomonster/supergemma4-e4b-abliterated-litert-lm

Finetuned
(2)
this model