Instructions to use Tushe/AMINI-ASSISTANT-GGUF-F16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Tushe/AMINI-ASSISTANT-GGUF-F16", filename="AMINI-f16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16 # Run inference directly in the terminal: llama-cli -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16 # Run inference directly in the terminal: llama-cli -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16 # Run inference directly in the terminal: ./llama-cli -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Use Docker
docker model run hf.co/Tushe/AMINI-ASSISTANT-GGUF-F16:F16
- LM Studio
- Jan
- Ollama
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with Ollama:
ollama run hf.co/Tushe/AMINI-ASSISTANT-GGUF-F16:F16
- Unsloth Studio new
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Tushe/AMINI-ASSISTANT-GGUF-F16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Tushe/AMINI-ASSISTANT-GGUF-F16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Tushe/AMINI-ASSISTANT-GGUF-F16 to start chatting
- Pi new
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Tushe/AMINI-ASSISTANT-GGUF-F16:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Run Hermes
hermes
- Docker Model Runner
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with Docker Model Runner:
docker model run hf.co/Tushe/AMINI-ASSISTANT-GGUF-F16:F16
- Lemonade
How to use Tushe/AMINI-ASSISTANT-GGUF-F16 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Tushe/AMINI-ASSISTANT-GGUF-F16:F16
Run and chat with the model
lemonade run user.AMINI-ASSISTANT-GGUF-F16-F16
List all available models
lemonade list
๐ฉ AMINI F16 GGUF โ Tushe Foundry Edge Inference Pack
Quantised & packaged by Tushe โ The Foundry Research Team
๐งญ What Is This?
We are Tushe โ The Foundry Research Team, and we are building a bare-metal inference engine for African language AI on constrained hardware.
This repository is part of our open-source model-baked-on-metal-inference-engine package โ a lightweight inference runtime we are releasing as:
| Format | Use case |
|---|---|
๐ Python library (pip install tushe-bare-metal) |
Server, Raspberry Pi, edge Linux |
๐ฆ npm package (npm install tushe-bare-metal) |
Node.js apps, Electron, React Native |
| โ๏ธ Compiled C executable | Bare-metal embedded, IoT, MCUs |
Every developer can drop this into their app and run offline African language inference right away โ no internet, no cloud, no GPU required.
Note: ๐ก This is the F16 (full precision) version. It is best used for benchmarking, research, and producing further quantisations. For deployment on phones and edge devices, use the lighter Q4_K_M version (4.5 GB, runs on 8 GB RAM).๐ก This is the F16 (full precision) version. It is best used for benchmarking, research, and producing further quantisations. For deployment on phones and edge devices, use the lighter Q4_K_M version (4.5 GB, runs on 8 GB RAM).
๐ฏ Why We Built This
Africa has some of the most resource-constrained connectivity environments in the world. Millions of people โ rural doctors, farmers, teachers, students, traders, and tourists โ need intelligent language tools but have no reliable internet access.
We took N-ATLaS, the Llama 3 8B model fine-tuned on Nigerian and African languages by Awarri Technologies in collaboration with NCAIR, and quantised it with immense optimizations to run on low-resource hardware including:
- ๐ฑ Android & iOS phones
- ๐พ Edge IoT devices in agricultural fields
- ๐ฅ Offline clinical/medical support tools in rural clinics
- ๐ซ Classrooms with no internet access
- ๐งญ Portable translator devices for traders and tourists
๐ Target Use Cases
| Domain | Description |
|---|---|
| ๐ฅ Rural & Edge Medical | Doctors and health workers in remote clinics โ symptom triage, patient communication, drug info in local languages |
| ๐พ Farmers Support | Modern and rural farmers โ crop advice, weather interpretation, market prices, pest identification in Hausa, Igbo, Yoruba |
| ๐ซ Education | Teachers and students in areas without internet โ explanations, tutoring, literacy support in local languages |
| ๐ Traders & Markets | Cross-language communication for traders and informal markets across Africa |
| โ๏ธ Tourists | Real-time offline translation across African languages |
๐ฉ About the Base Model
This GGUF is derived from NCAIR1/N-ATLaS โ An open-source multilingual LLM, built on Llama 3 8B, fine-tuned by Awarri Technologies in collaboration with the National Centre for Artificial Intelligence & Robotics (NCAIR) and the Federal Ministry of Communications, Innovation and Digital Economy of Nigeria.
N-ATLaS was trained on approximately ~392 million multilingual tokens spanning English, Hausa, Igbo, and Yoruba.
We did not change the weights. We quantised the original model and build a highly optimized efficient inference engine to enable edge deployment.
๐พ This File
| File | Quant | Size | Min RAM |
|---|---|---|---|
AMINI-F16.gguf |
F16 | ~16 GB | 25-32 GB |
F16 is our recommended quant for edge deployment โ best balance of accuracy, speed, and memory. Runs on a phone with 8GB RAM or a Raspberry Pi 5.
๐ Inference Examples
1. Python โ llama-cpp-python
pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id = "AlaminI/AMINI-ASSISTANT-GGUF-F16",
filename = "*.gguf",
n_ctx = 2048,
n_gpu_layers = 0, # 0 = CPU only (edge/offline), -1 = full GPU
verbose = False,
)
English โ rural medical support:
output = llm(
"A patient presents with fever, headache, and joint pain for 3 days. What are the possible diagnoses and first-line management?",
max_tokens = 256,
temperature = 0.7,
echo = False,
)
print(output["choices"][0]["text"])
Hausa โ farmer support:
output = llm(
"Gonar hatsi na da kwari da yawa. Menene zan iya yi don kare amfanin gona na?",
max_tokens = 256,
temperature = 0.7,
echo = False,
)
print(output["choices"][0]["text"])
Yoruba โ student support:
output = llm(
"แนขe alaye ohun ti photosynthesis jแบน ni ede Yoruba fun แปmแป ile-iwe.",
max_tokens = 256,
temperature = 0.7,
echo = False,
)
print(output["choices"][0]["text"])
Igbo โ trader/market support:
output = llm(
"Gwa m แปnแปฅ ahแปa nke แปka ugbu a n'ahแปa Onitsha.",
max_tokens = 256,
temperature = 0.7,
echo = False,
)
print(output["choices"][0]["text"])
2. Chat format โ multilingual instruction
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id = "AlaminI/AMINI-ASSISTANT-GGUF-F16",
filename = "*.gguf",
n_ctx = 2048,
n_gpu_layers = 0,
verbose = False,
)
response = llm.create_chat_completion(
messages = [
{
"role": "system",
"content": (
"You are an offline African language assistant running on a local device. "
"You support English, Hausa, Igbo, and Yoruba. "
"Respond in the same language the user writes in. "
"Be concise โ this device has limited resources."
)
},
{
"role": "user",
"content": "Translate 'The child has a high fever and needs immediate care' into Hausa and Yoruba."
}
],
max_tokens = 256,
temperature = 0.7,
)
print(response["choices"][0]["message"]["content"])
3. Streaming (for responsive UIs on edge devices)
stream = llm.create_chat_completion(
messages = [
{"role": "user", "content": "Explain crop rotation to a farmer in Hausa."}
],
max_tokens = 256,
temperature = 0.7,
stream = True,
)
for chunk in stream:
delta = chunk["choices"][0]["delta"].get("content", "")
print(delta, end="", flush=True)
4. Node.js โ node-llama-cpp
npm install node-llama-cpp
import { getLlama, LlamaChatSession } from "node-llama-cpp";
import path from "path";
const llama = await getLlama();
const model = await llama.loadModel({ modelPath: path.join("models", "AMINI-F16.gguf") });
const context = await model.createContext({ contextSize: 2048 });
const session = new LlamaChatSession({ contextSequence: context.getSequence() });
const response = await session.prompt(
"A farmer asks: my tomatoes are wilting despite regular watering. What could be wrong?",
{ maxTokens: 256 }
);
console.log(response);
5. llama.cpp CLI (bare-metal / embedded)
# Download
huggingface-cli download AlaminI/AMINI-ASSISTANT-GGUF-F16 \
AMINI-F16.gguf --local-dir ./models/
# Run on CPU only (edge device)
./llama-cli -m ./models/AMINI-F16.gguf \
--ctx-size 2048 \
--threads 4 \
--temp 0.7 \
-i -r "User:" \
-p "You are an offline assistant for African languages. Respond in the user's language.\nUser:"
6. Ollama (local server mode)
ollama run hf.co/AlaminI/AMINI-ASSISTANT-GGUF-F16
โ๏ธ Recommended Settings for Edge Devices
| Parameter | Value | Notes |
|---|---|---|
n_ctx |
512โ1024 | Reduce on very low RAM devices |
n_gpu_layers |
0 | CPU-only for phones/IoT |
n_threads |
4 | Match your device's core count |
temperature |
0.7 | Balanced responses |
max_tokens |
128โ256 | Keep short for low-latency UX |
repeat_penalty |
1.1 | Reduces looping on edge |
๐ Related Repositories
| Repo | Description |
|---|---|
| NCAIR1/N-ATLaS | Original model by Awarri / NCAIR |
| AlaminI/AMINI-ASSISTANT-GGUF-F16 | F16 GGUF (full precision, for re-quantising) |
โ ๏ธ License
This GGUF quantisation is an independent contribution by Tushe โ The Foundry Research Team, We enncorage developers to refer to the N-ATLaS licence. But our Inference engine ca be used for any mean, commercial and beyond any user Number. We will rellease models traind by us to give developers fullly open-source models and inference at edge
- Downloads last month
- 18
16-bit
Model tree for Tushe/AMINI-ASSISTANT-GGUF-F16
Base model
NCAIR1/N-ATLaS