Update README.md
Browse files
README.md
CHANGED
|
@@ -11,6 +11,9 @@ tags:
|
|
| 11 |
|
| 12 |
## Model Summary
|
| 13 |
|
|
|
|
|
|
|
|
|
|
| 14 |
This Hub repository contains a HuggingFace's `transformers` implementation of Florence-2 model from Microsoft.
|
| 15 |
|
| 16 |
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.
|
|
|
|
| 11 |
|
| 12 |
## Model Summary
|
| 13 |
|
| 14 |
+
This is a copy of Microsoft's model with a few fixes. The PRs for the fixes are open on the original model but until they merge I'm using this one to have everything set up correctly.
|
| 15 |
+
|
| 16 |
+
|
| 17 |
This Hub repository contains a HuggingFace's `transformers` implementation of Florence-2 model from Microsoft.
|
| 18 |
|
| 19 |
Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.
|