update readme
Browse files
README.md
CHANGED
|
@@ -144,15 +144,23 @@ Compared to `jina-reranker-v2-base-multilingual`, `jina-reranker-m0` significant
|
|
| 144 |
pip install transformers >= 4.47.3
|
| 145 |
```
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
And then use the following code snippet to load the model:
|
| 148 |
|
| 149 |
```python
|
| 150 |
from transformers import AutoModel
|
| 151 |
|
|
|
|
| 152 |
model = AutoModel.from_pretrained(
|
| 153 |
'jinaai/jina-reranker-m0',
|
| 154 |
torch_dtype="auto",
|
| 155 |
trust_remote_code=True,
|
|
|
|
| 156 |
)
|
| 157 |
|
| 158 |
model.to('cuda') # or 'cpu' if no GPU is available
|
|
|
|
| 144 |
pip install transformers >= 4.47.3
|
| 145 |
```
|
| 146 |
|
| 147 |
+
If you run it on a GPU that support FlashAttention-2. By 2024.9.12, it supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100),
|
| 148 |
+
|
| 149 |
+
```bash
|
| 150 |
+
pip install flash-attn --no-build-isolation
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
And then use the following code snippet to load the model:
|
| 154 |
|
| 155 |
```python
|
| 156 |
from transformers import AutoModel
|
| 157 |
|
| 158 |
+
# comment out the flash_attention_2 line if you don't have a compatible GPU
|
| 159 |
model = AutoModel.from_pretrained(
|
| 160 |
'jinaai/jina-reranker-m0',
|
| 161 |
torch_dtype="auto",
|
| 162 |
trust_remote_code=True,
|
| 163 |
+
attn_implementation="flash_attention_2"
|
| 164 |
)
|
| 165 |
|
| 166 |
model.to('cuda') # or 'cpu' if no GPU is available
|