mlx-community
/

Fara-7B-4bit

@@ -1,31 +1,129 @@
 ---
-language: en
-library_name: mlx
-pipeline_tag: text-generation
 tags:
 - mlx
 ---
-# mlx-community/Fara-7B-4bit
-## Use with mlx
 ```bash
 pip install mlx-lm
 ```
 ```python
 from mlx_lm import load, generate
 model, tokenizer = load("mlx-community/Fara-7B-4bit")
-prompt = "hello"
-if tokenizer.chat_template is not None:
-    messages = [{"role": "user", "content": prompt}]
-    prompt = tokenizer.apply_chat_template(
-        messages, add_generation_prompt=True
-    )
-response = generate(model, tokenizer, prompt=prompt, verbose=True)
 ```

 ---
+language:
+- en
+license: apache-2.0
 tags:
 - mlx
+- qwen2.5
+- text-generation
+base_model: microsoft/Fara-7B
+pipeline_tag: text-generation
 ---
+# Fara-7B-4bit (Text-Only MLX)
+This is a 4-bit quantized **text-only** version of [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B) optimized for Apple Silicon using MLX.
+⚠️ **Important**: This conversion only includes the language model components. The vision capabilities from the original Fara-7B model are **not included** in this version.
+## Model Details
+- **Base Model**: [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
+- **Architecture**: Qwen2.5 (text-only)
+- **Quantization**: 4-bit (4.501 bits per weight)
+- **Format**: MLX
+- **Parameters**: ~7B
+- **License**: Apache 2.0
+## Capabilities
+✅ **Supported**:
+- Text generation
+- Chat/instruction following
+- Code generation
+- Question answering
+❌ **Not Supported**:
+- Image understanding
+- Visual question answering
+- Multimodal tasks
+## Usage
+### Installation
 ```bash
 pip install mlx-lm
 ```
+### Basic Text Generation
+```python
+from mlx_lm import load, generate
+# Load the model
+model, tokenizer = load("mlx-community/Fara-7B-4bit")
+# Generate text
+prompt = "What is machine learning?"
+response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
+print(response)
+```
+### Chat Format
 ```python
 from mlx_lm import load, generate
 model, tokenizer = load("mlx-community/Fara-7B-4bit")
+# Use chat template
+messages = [
+    {"role": "user", "content": "Explain quantum computing in simple terms"}
+]
+prompt = tokenizer.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    tokenize=False
+)
+response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
+print(response)
 ```
+## Performance
+- **Speed**: ~100+ tokens/sec on M-series chips
+- **Memory**: ~4-5GB VRAM required
+- **Optimized for**: Apple Silicon (M1/M2/M3/M4)
+## For Vision Capabilities
+If you need the vision capabilities of Fara-7B, please use:
+- **GGUF version**: [bartowski/microsoft_Fara-7B-GGUF](https://huggingface.co/bartowski/microsoft_Fara-7B-GGUF)
+- **Original model**: [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
+## Known Limitations
+1. Vision tower weights are not included
+2. Cannot process images
+3. Text-only inference
+4. May generate `<tool_call>` tokens in responses (can be ignored or filtered)
+## Conversion Details
+This model was converted using `mlx_lm.convert()` with the following modifications:
+- Fixed config to properly map `tie_word_embeddings` in text_config
+- 4-bit quantization applied to language model weights
+- Vision tower components excluded (not supported by mlx-lm converter)
+## Citation
+If you use this model, please cite the original Fara-7B paper and model:
+```bibtex
+@misc{fara-7b,
+  title={Fara-7B},
+  author={Microsoft},
+  year={2024},
+  publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/microsoft/Fara-7B}}
+}
+```
+## Acknowledgments
+- Original model by Microsoft
+- Converted for MLX by the community
+- Based on Qwen2.5 architecture
+## Issues & Feedback
+If you encounter any issues with this model, please report them on the [model discussion page](https://huggingface.co/mlx-community/Fara-7B-4bit/discussions).