manjunathshiva commited on
Commit
60cb7a4
·
verified ·
1 Parent(s): 24e7612

Update README.md

Browse files

Update as it only support Text only for now

Files changed (1) hide show
  1. README.md +110 -12
README.md CHANGED
@@ -1,31 +1,129 @@
1
  ---
2
- language: en
3
- library_name: mlx
4
- pipeline_tag: text-generation
5
  tags:
6
  - mlx
 
 
 
 
7
  ---
8
 
9
- # mlx-community/Fara-7B-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- ## Use with mlx
 
 
 
12
 
 
 
 
13
  ```bash
14
  pip install mlx-lm
15
  ```
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ```python
18
  from mlx_lm import load, generate
19
 
20
  model, tokenizer = load("mlx-community/Fara-7B-4bit")
21
 
22
- prompt = "hello"
 
 
 
23
 
24
- if tokenizer.chat_template is not None:
25
- messages = [{"role": "user", "content": prompt}]
26
- prompt = tokenizer.apply_chat_template(
27
- messages, add_generation_prompt=True
28
- )
29
 
30
- response = generate(model, tokenizer, prompt=prompt, verbose=True)
 
31
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
  - mlx
7
+ - qwen2.5
8
+ - text-generation
9
+ base_model: microsoft/Fara-7B
10
+ pipeline_tag: text-generation
11
  ---
12
 
13
+ # Fara-7B-4bit (Text-Only MLX)
14
+
15
+ This is a 4-bit quantized **text-only** version of [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B) optimized for Apple Silicon using MLX.
16
+
17
+ ⚠️ **Important**: This conversion only includes the language model components. The vision capabilities from the original Fara-7B model are **not included** in this version.
18
+
19
+ ## Model Details
20
+
21
+ - **Base Model**: [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
22
+ - **Architecture**: Qwen2.5 (text-only)
23
+ - **Quantization**: 4-bit (4.501 bits per weight)
24
+ - **Format**: MLX
25
+ - **Parameters**: ~7B
26
+ - **License**: Apache 2.0
27
+
28
+ ## Capabilities
29
+
30
+ ✅ **Supported**:
31
+ - Text generation
32
+ - Chat/instruction following
33
+ - Code generation
34
+ - Question answering
35
 
36
+ **Not Supported**:
37
+ - Image understanding
38
+ - Visual question answering
39
+ - Multimodal tasks
40
 
41
+ ## Usage
42
+
43
+ ### Installation
44
  ```bash
45
  pip install mlx-lm
46
  ```
47
 
48
+ ### Basic Text Generation
49
+ ```python
50
+ from mlx_lm import load, generate
51
+
52
+ # Load the model
53
+ model, tokenizer = load("mlx-community/Fara-7B-4bit")
54
+
55
+ # Generate text
56
+ prompt = "What is machine learning?"
57
+ response = generate(model, tokenizer, prompt=prompt, max_tokens=100)
58
+ print(response)
59
+ ```
60
+
61
+ ### Chat Format
62
  ```python
63
  from mlx_lm import load, generate
64
 
65
  model, tokenizer = load("mlx-community/Fara-7B-4bit")
66
 
67
+ # Use chat template
68
+ messages = [
69
+ {"role": "user", "content": "Explain quantum computing in simple terms"}
70
+ ]
71
 
72
+ prompt = tokenizer.apply_chat_template(
73
+ messages,
74
+ add_generation_prompt=True,
75
+ tokenize=False
76
+ )
77
 
78
+ response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
79
+ print(response)
80
  ```
81
+
82
+ ## Performance
83
+
84
+ - **Speed**: ~100+ tokens/sec on M-series chips
85
+ - **Memory**: ~4-5GB VRAM required
86
+ - **Optimized for**: Apple Silicon (M1/M2/M3/M4)
87
+
88
+ ## For Vision Capabilities
89
+
90
+ If you need the vision capabilities of Fara-7B, please use:
91
+ - **GGUF version**: [bartowski/microsoft_Fara-7B-GGUF](https://huggingface.co/bartowski/microsoft_Fara-7B-GGUF)
92
+ - **Original model**: [microsoft/Fara-7B](https://huggingface.co/microsoft/Fara-7B)
93
+
94
+ ## Known Limitations
95
+
96
+ 1. Vision tower weights are not included
97
+ 2. Cannot process images
98
+ 3. Text-only inference
99
+ 4. May generate `<tool_call>` tokens in responses (can be ignored or filtered)
100
+
101
+ ## Conversion Details
102
+
103
+ This model was converted using `mlx_lm.convert()` with the following modifications:
104
+ - Fixed config to properly map `tie_word_embeddings` in text_config
105
+ - 4-bit quantization applied to language model weights
106
+ - Vision tower components excluded (not supported by mlx-lm converter)
107
+
108
+ ## Citation
109
+
110
+ If you use this model, please cite the original Fara-7B paper and model:
111
+ ```bibtex
112
+ @misc{fara-7b,
113
+ title={Fara-7B},
114
+ author={Microsoft},
115
+ year={2024},
116
+ publisher={Hugging Face},
117
+ howpublished={\url{https://huggingface.co/microsoft/Fara-7B}}
118
+ }
119
+ ```
120
+
121
+ ## Acknowledgments
122
+
123
+ - Original model by Microsoft
124
+ - Converted for MLX by the community
125
+ - Based on Qwen2.5 architecture
126
+
127
+ ## Issues & Feedback
128
+
129
+ If you encounter any issues with this model, please report them on the [model discussion page](https://huggingface.co/mlx-community/Fara-7B-4bit/discussions).