Fara-7B-4bit (Text-Only MLX)

This is a 4-bit quantized text-only version of microsoft/Fara-7B optimized for Apple Silicon using MLX.

โš ๏ธ Important: This conversion only includes the language model components. The vision capabilities from the original Fara-7B model are not included in this version.

Model Details

  • Base Model: microsoft/Fara-7B
  • Architecture: Qwen2.5 (text-only)
  • Quantization: 4-bit (4.501 bits per weight)
  • Format: MLX
  • Parameters: ~7B
  • License: Apache 2.0

Capabilities

โœ… Supported:

  • Text generation
  • Chat/instruction following
  • Code generation
  • Question answering

โŒ Not Supported:

  • Image understanding
  • Visual question answering
  • Multimodal tasks

Usage

Installation

pip install mlx-lm

Basic Text Generation

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("mlx-community/Fara-7B-4bit")

# Generate text
prompt = "What is machine learning?"
response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
print(response)

Chat Format

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Fara-7B-4bit")

# Use chat template
messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms"}
]

prompt = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    tokenize=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
print(response)

Performance

  • Speed: ~100+ tokens/sec on M-series chips
  • Memory: ~4-5GB VRAM required
  • Optimized for: Apple Silicon (M1/M2/M3/M4)

For Vision Capabilities

If you need the vision capabilities of Fara-7B, please use:

Known Limitations

  1. Vision tower weights are not included
  2. Cannot process images
  3. Text-only inference
  4. May generate <tool_call> tokens in responses (can be ignored or filtered)

Conversion Details

This model was converted using mlx_lm.convert() with the following modifications:

  • Fixed config to properly map tie_word_embeddings in text_config
  • 4-bit quantization applied to language model weights
  • Vision tower components excluded (not supported by mlx-lm converter)

Citation

If you use this model, please cite the original Fara-7B paper and model:

@misc{fara-7b,
  title={Fara-7B},
  author={Microsoft},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/microsoft/Fara-7B}}
}

Acknowledgments

  • Original model by Microsoft
  • Converted for MLX by the community
  • Based on Qwen2.5 architecture

Issues & Feedback

If you encounter any issues with this model, please report them on the model discussion page.

Downloads last month
96
Safetensors
Model size
1B params
Tensor type
BF16
ยท
U32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/Fara-7B-4bit

Base model

microsoft/Fara-7B
Quantized
(7)
this model