Instructions to use typomonster/supergemma4-e4b-abliterated-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use typomonster/supergemma4-e4b-abliterated-litert-lm with LiteRT-LM:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
SuperGemma 4 E4B Abliterated β LiteRT-LM
β οΈ Unofficial build. This LiteRT-LM package is not published by Google, the LiteRT team, or
Jiunsongβ it is a community conversion. Because of int4/int8 re-quantisation during packaging, outputs may differ from the sourceJiunsong/supergemma4-e4b-abliteratedcheckpoint and, in particular, the abliterated behaviour may be attenuated on some prompts. For reference behaviour, run the source checkpoint directly (e.g. viatransformers) or use the MLX 4-bit companion build on Apple Silicon.
On-device build of
Jiunsong/supergemma4-e4b-abliterated
in the .litertlm format for the
LiteRT-LM runtime
(Android, iOS, macOS, Linux, Windows, Web).
Companion Apple-Silicon-targeted builds for this fine-tune:
| Target | Repo |
|---|---|
| LiteRT-LM (this repo) β CPU + Metal GPU via LiteRT-LM CLI / Android / iOS / Desktop | litert-lm format, 3.65 GB |
| MLX 4-bit | MLX safetensors, Mac Studio class local serving |
The LiteRT-LM bundle is the path for cross-platform on-device deployment (Android, iOS, Desktop, Web β all through the same file). The MLX build above is the fastest option for local serving on macOS specifically.
Base models
google/gemma-4-E4B-itβ the original Gemma 4 E4B instruct checkpoint.Jiunsong/supergemma4-e4b-abliteratedβ the abliterated fine-tune. Abliteration removes the refusal direction from the residual stream; see the source card for release behaviour notes.
Run it
CLI (fastest way to try it)
uv tool install litert-lm # one-time
# Run directly from this HF repo, on Apple Silicon Metal GPU:
litert-lm run --from-huggingface-repo typomonster/supergemma4-e4b-abliterated-litert-lm \
supergemma4-e4b-abliterated.litertlm \
--prompt "Write a three-sentence story about a robot who discovers music." \
--backend gpu \
--enable-speculative-decoding=false
--backend gpu routes through libLiteRtMetalAccelerator on macOS ARM64;
measured ~40 tok/s decode on Metal vs ~13 tok/s on CPU (M-series, your
mileage will vary).
--enable-speculative-decoding=false is recommended β see the caveats.
Platform SDKs
- Android / iOS / Desktop: see the LiteRT-LM documentation for platform integration guides.
- Python: see
python/litert_lm/examples/simple_main.pyin the LiteRT-LM repo.
Caveats
Soft-refusal behaviour may be attenuated vs. the source HF fine-tune. If you need the strongest abliterated behaviour on Apple Silicon, use the MLX build linked above.
Run with
--enable-speculative-decoding=false. The MTP drafter in this bundle may have a reduced accept rate against the modified main model. Speculative decoding remains correct (the main model verifies each proposed token) but is not guaranteed to be a speedup here.
License
This build inherits the Gemma license from its upstream bases. Review the terms there before redistribution.
Acknowledgements
- Upstream model:
google/gemma-4-E4B-it(Google DeepMind). - Abliteration fine-tune:
Jiunsong/supergemma4-e4b-abliterated.
- Downloads last month
- 637
Model tree for typomonster/supergemma4-e4b-abliterated-litert-lm
Base model
google/gemma-4-E4B