Fara-7B-4bit (Text-Only MLX)

This is a 4-bit quantized text-only version of microsoft/Fara-7B optimized for Apple Silicon using MLX.

⚠️ Important: This conversion only includes the language model components. The vision capabilities from the original Fara-7B model are not included in this version.

Model Details

Base Model: microsoft/Fara-7B
Architecture: Qwen2.5 (text-only)
Quantization: 4-bit (4.501 bits per weight)
Format: MLX
Parameters: ~7B
License: Apache 2.0

Capabilities

✅ Supported:

Text generation
Chat/instruction following
Code generation
Question answering

❌ Not Supported:

Image understanding
Visual question answering
Multimodal tasks

Usage

Installation

pip install mlx-lm

Basic Text Generation

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("mlx-community/Fara-7B-4bit")

# Generate text
prompt = "What is machine learning?"
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
print(response)

Chat Format

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Fara-7B-4bit")

# Use chat template
messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms"}
]

prompt = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    tokenize=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
print(response)

Performance

Speed: ~100+ tokens/sec on M-series chips
Memory: ~4-5GB VRAM required
Optimized for: Apple Silicon (M1/M2/M3/M4)

For Vision Capabilities

If you need the vision capabilities of Fara-7B, please use:

GGUF version: bartowski/microsoft_Fara-7B-GGUF
Original model: microsoft/Fara-7B

Known Limitations

Vision tower weights are not included
Cannot process images
Text-only inference
May generate <tool_call> tokens in responses (can be ignored or filtered)

Conversion Details

This model was converted using mlx_lm.convert() with the following modifications:

Fixed config to properly map tie_word_embeddings in text_config
4-bit quantization applied to language model weights
Vision tower components excluded (not supported by mlx-lm converter)

Citation

If you use this model, please cite the original Fara-7B paper and model:

@misc{fara-7b,
  title={Fara-7B},
  author={Microsoft},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/microsoft/Fara-7B}}
}

Acknowledgments

Original model by Microsoft
Converted for MLX by the community
Based on Qwen2.5 architecture

Issues & Feedback

If you encounter any issues with this model, please report them on the model discussion page.

Downloads last month: 96

Safetensors

Model size

1B params

Tensor type

BF16

U32

Model tree for mlx-community/Fara-7B-4bit

Base model

microsoft/Fara-7B

Quantized

(7)

this model