Update README.md
Browse files
README.md
CHANGED
@@ -9,8 +9,8 @@ license: apache-2.0
|
|
9 |
This is an OpenCLIP (image + text) remaped version of the the [original](https://huggingface.co/facebook/PE-Core-S16-384)
|
10 |
|
11 |
[\[π Tech Report\]](https://arxiv.org/abs/2504.13181)
|
12 |
-
[\[π PE Github\]](https://github.com/facebookresearch/perception_models/)
|
13 |
-
[\[π OpenCLIP Github\]](https://github.com/mlfoundations/open_clip)
|
14 |
|
15 |
Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
|
16 |
are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
|
@@ -46,9 +46,11 @@ PE core obtains extremely strong results across the board on zero-shot image cla
|
|
46 |
|
47 |
| Model | Checkpoint | IN-1k | IN-v2 | IN-A | ObjectNet | COCO-T2I | Kinetics-400 | VTT-T2I
|
48 |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
49 |
-
| **
|
50 |
-
| **
|
51 |
-
| **
|
|
|
|
|
52 |
|
53 |
PE core performs particularly well on the _hard_ benchmarks such as ObjectNet and ImageNet-A.
|
54 |
|
|
|
9 |
This is an OpenCLIP (image + text) remaped version of the the [original](https://huggingface.co/facebook/PE-Core-S16-384)
|
10 |
|
11 |
[\[π Tech Report\]](https://arxiv.org/abs/2504.13181)
|
12 |
+
[\[π PE Github (original weights)\]](https://github.com/facebookresearch/perception_models/)
|
13 |
+
[\[π OpenCLIP Github (these weights)\]](https://github.com/mlfoundations/open_clip)
|
14 |
|
15 |
Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
|
16 |
are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
|
|
|
46 |
|
47 |
| Model | Checkpoint | IN-1k | IN-v2 | IN-A | ObjectNet | COCO-T2I | Kinetics-400 | VTT-T2I
|
48 |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
49 |
+
| **T/16** 384px | [PE-Core-T-16-384](https://huggingface.co/timm/PE-Core-T-16-384) | 62.1 | 54.7 | 21.1 | 43.9 | 33.0 | 41.5 | 28.8 |
|
50 |
+
| **S/16** 384px | [PE-Core-S-16-384](https://huggingface.co/timm/PE-Core-S-16-384) | 72.7 | 65.0 | 49.5 | 60.0 | 42.6 | 55.0 | 39.3 |
|
51 |
+
| **B/16** 224px | [PE-Core-B-16](https://huggingface.co/timm/PE-Core-B-16) | 78.4 | 71.7 | 62.4 | 71.9 | 50.9 | 65.6 | 47.6 |
|
52 |
+
| **L/14** 336px | [PE-Core-L-14-336](https://huggingface.co/timm/PE-Core-L-14-336) | 83.5 | 77.9 | 89.0 | 84.7 | 57.1 | 73.4 | 50.3 |
|
53 |
+
| **G/14** 448px | [PE-Core-bigG-14-448](https://huggingface.co/timm/PE-Core-bigG-14-448) | 85.4 | 80.2 | 92.6 | 88.2 | 58.1 | 76.9 | 51.2 |
|
54 |
|
55 |
PE core performs particularly well on the _hard_ benchmarks such as ObjectNet and ImageNet-A.
|
56 |
|