timm
/

rwightman HF Staff commited on
Commit
3249a38
Β·
verified Β·
1 Parent(s): efea53c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -9,8 +9,8 @@ license: apache-2.0
9
  This is an OpenCLIP (image + text) remaped version of the the [original](https://huggingface.co/facebook/PE-Core-S16-384)
10
 
11
  [\[πŸ“ƒ Tech Report\]](https://arxiv.org/abs/2504.13181)
12
- [\[πŸ“‚ PE Github\]](https://github.com/facebookresearch/perception_models/)
13
- [\[πŸ“‚ OpenCLIP Github\]](https://github.com/mlfoundations/open_clip)
14
 
15
  Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
16
  are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
@@ -46,9 +46,11 @@ PE core obtains extremely strong results across the board on zero-shot image cla
46
 
47
  | Model | Checkpoint | IN-1k | IN-v2 | IN-A | ObjectNet | COCO-T2I | Kinetics-400 | VTT-T2I
48
  |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
49
- | **B/16** 224px | [PE-Core-B16-224](https://huggingface.co/facebook/PE-Core-B16-224) | 78.4 | 71.7 | 62.4 | 71.9 | 50.9 | 65.6 | 47.6 |
50
- | **L/14** 336px | [PE-Core-L14-336](https://huggingface.co/facebook/PE-Core-L14-336) | 83.5 | 77.9 | 89.0 | 84.7 | 57.1 | 73.4 | 50.3 |
51
- | **G/14** 448px | [PE-Core-G14-448](https://huggingface.co/facebook/PE-Core-G14-448) | 85.4 | 80.2 | 92.6 | 88.2 | 58.1 | 76.9 | 51.2 |
 
 
52
 
53
  PE core performs particularly well on the _hard_ benchmarks such as ObjectNet and ImageNet-A.
54
 
 
9
  This is an OpenCLIP (image + text) remaped version of the the [original](https://huggingface.co/facebook/PE-Core-S16-384)
10
 
11
  [\[πŸ“ƒ Tech Report\]](https://arxiv.org/abs/2504.13181)
12
+ [\[πŸ“‚ PE Github (original weights)\]](https://github.com/facebookresearch/perception_models/)
13
+ [\[πŸ“‚ OpenCLIP Github (these weights)\]](https://github.com/mlfoundations/open_clip)
14
 
15
  Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
16
  are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
 
46
 
47
  | Model | Checkpoint | IN-1k | IN-v2 | IN-A | ObjectNet | COCO-T2I | Kinetics-400 | VTT-T2I
48
  |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
49
+ | **T/16** 384px | [PE-Core-T-16-384](https://huggingface.co/timm/PE-Core-T-16-384) | 62.1 | 54.7 | 21.1 | 43.9 | 33.0 | 41.5 | 28.8 |
50
+ | **S/16** 384px | [PE-Core-S-16-384](https://huggingface.co/timm/PE-Core-S-16-384) | 72.7 | 65.0 | 49.5 | 60.0 | 42.6 | 55.0 | 39.3 |
51
+ | **B/16** 224px | [PE-Core-B-16](https://huggingface.co/timm/PE-Core-B-16) | 78.4 | 71.7 | 62.4 | 71.9 | 50.9 | 65.6 | 47.6 |
52
+ | **L/14** 336px | [PE-Core-L-14-336](https://huggingface.co/timm/PE-Core-L-14-336) | 83.5 | 77.9 | 89.0 | 84.7 | 57.1 | 73.4 | 50.3 |
53
+ | **G/14** 448px | [PE-Core-bigG-14-448](https://huggingface.co/timm/PE-Core-bigG-14-448) | 85.4 | 80.2 | 92.6 | 88.2 | 58.1 | 76.9 | 51.2 |
54
 
55
  PE core performs particularly well on the _hard_ benchmarks such as ObjectNet and ImageNet-A.
56