Update README.md
Browse files
README.md
CHANGED
|
@@ -60,11 +60,15 @@ datasets:
|
|
| 60 |
license: apache-2.0
|
| 61 |
---
|
| 62 |
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
| 65 |
and dataset collection.
|
| 66 |
|
| 67 |
-
According ot the original [blog]() here are the notable improvements:
|
| 68 |
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
| 69 |
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
| 70 |
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
|
@@ -86,10 +90,13 @@ The reported results are the following :
|
|
| 86 |
|
| 87 |
# Using the model
|
| 88 |
|
|
|
|
|
|
|
| 89 |
```python
|
|
|
|
| 90 |
from transformers import AutoModelForConditionalGeneration, AutoTokenizer
|
| 91 |
import torch
|
| 92 |
-
model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto",
|
| 93 |
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
|
| 94 |
|
| 95 |
input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
|
|
@@ -99,7 +106,24 @@ outputs = model.generate(inputs, max_length=200)
|
|
| 99 |
|
| 100 |
print(tokenizer.decode(outputs[0]))
|
| 101 |
# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
|
|
|
|
|
|
|
|
|
|
| 102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
```
|
| 104 |
|
| 105 |
|
|
@@ -193,7 +217,7 @@ In total, the model was trained for 2.65 million steps.
|
|
| 193 |
|
| 194 |
## Contribution
|
| 195 |
|
| 196 |
-
This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
|
| 197 |
|
| 198 |
## Examples
|
| 199 |
|
|
|
|
| 60 |
license: apache-2.0
|
| 61 |
---
|
| 62 |
|
| 63 |
+
|
| 64 |
+
# Model card for FLan-UL2
|
| 65 |
+
|
| 66 |
+

|
| 67 |
+
|
| 68 |
Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
|
| 69 |
and dataset collection.
|
| 70 |
|
| 71 |
+
According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
|
| 72 |
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
|
| 73 |
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
|
| 74 |
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
|
|
|
|
| 90 |
|
| 91 |
# Using the model
|
| 92 |
|
| 93 |
+
For more efficient memory usage, we advise you to load the model in `8bit` using `load_in_8bit` flag as follows:
|
| 94 |
+
|
| 95 |
```python
|
| 96 |
+
# pip install accelerate transformers bitsandbytes
|
| 97 |
from transformers import AutoModelForConditionalGeneration, AutoTokenizer
|
| 98 |
import torch
|
| 99 |
+
model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bit=True)
|
| 100 |
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
|
| 101 |
|
| 102 |
input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
|
|
|
|
| 106 |
|
| 107 |
print(tokenizer.decode(outputs[0]))
|
| 108 |
# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
Otherwise, you can load and run the model in `bfloat16` as follows:
|
| 112 |
|
| 113 |
+
```python
|
| 114 |
+
# pip install accelerate transformers
|
| 115 |
+
from transformers import AutoModelForConditionalGeneration, AutoTokenizer
|
| 116 |
+
import torch
|
| 117 |
+
model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", torch_dtype=torch.bfloat16, device_map="auto")
|
| 118 |
+
tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
|
| 119 |
+
|
| 120 |
+
input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
|
| 121 |
+
|
| 122 |
+
inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
|
| 123 |
+
outputs = model.generate(inputs, max_length=200)
|
| 124 |
+
|
| 125 |
+
print(tokenizer.decode(outputs[0]))
|
| 126 |
+
# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
|
| 127 |
```
|
| 128 |
|
| 129 |
|
|
|
|
| 217 |
|
| 218 |
## Contribution
|
| 219 |
|
| 220 |
+
This model was originally contributed by [Yi Tay](https://www.yitay.net/?author=636616684c5e64780328eece), and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
|
| 221 |
|
| 222 |
## Examples
|
| 223 |
|