Fara-7B-4bit (Text-Only MLX)
This is a 4-bit quantized text-only version of microsoft/Fara-7B optimized for Apple Silicon using MLX.
โ ๏ธ Important: This conversion only includes the language model components. The vision capabilities from the original Fara-7B model are not included in this version.
Model Details
- Base Model: microsoft/Fara-7B
- Architecture: Qwen2.5 (text-only)
- Quantization: 4-bit (4.501 bits per weight)
- Format: MLX
- Parameters: ~7B
- License: Apache 2.0
Capabilities
โ Supported:
- Text generation
- Chat/instruction following
- Code generation
- Question answering
โ Not Supported:
- Image understanding
- Visual question answering
- Multimodal tasks
Usage
Installation
pip install mlx-lm
Basic Text Generation
from mlx_lm import load, generate
# Load the model
model, tokenizer = load("mlx-community/Fara-7B-4bit")
# Generate text
prompt = "What is machine learning?"
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
print(response)
Chat Format
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Fara-7B-4bit")
# Use chat template
messages = [
{"role": "user", "content": "Explain quantum computing in simple terms"}
]
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
print(response)
Performance
- Speed: ~100+ tokens/sec on M-series chips
- Memory: ~4-5GB VRAM required
- Optimized for: Apple Silicon (M1/M2/M3/M4)
For Vision Capabilities
If you need the vision capabilities of Fara-7B, please use:
- GGUF version: bartowski/microsoft_Fara-7B-GGUF
- Original model: microsoft/Fara-7B
Known Limitations
- Vision tower weights are not included
- Cannot process images
- Text-only inference
- May generate
<tool_call>tokens in responses (can be ignored or filtered)
Conversion Details
This model was converted using mlx_lm.convert() with the following modifications:
- Fixed config to properly map
tie_word_embeddingsin text_config - 4-bit quantization applied to language model weights
- Vision tower components excluded (not supported by mlx-lm converter)
Citation
If you use this model, please cite the original Fara-7B paper and model:
@misc{fara-7b,
title={Fara-7B},
author={Microsoft},
year={2024},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/microsoft/Fara-7B}}
}
Acknowledgments
- Original model by Microsoft
- Converted for MLX by the community
- Based on Qwen2.5 architecture
Issues & Feedback
If you encounter any issues with this model, please report them on the model discussion page.
- Downloads last month
- 96
Model tree for mlx-community/Fara-7B-4bit
Base model
microsoft/Fara-7B