bfshi-nvidia commited on
Commit
7575a20
·
verified ·
1 Parent(s): e9a9d3a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -19,7 +19,7 @@ This model is for research and development only.
19
 
20
  ### License/Terms of Use: <br>
21
 
22
- CC-BY-NC-SA-4.0
23
 
24
  ### Deployment Geography:
25
 
@@ -52,28 +52,28 @@ The model is from the paper [Scaling Vision Pre-Training to 4K Resolution](https
52
  **This model was developed based on [PS3-4K-SigLIP](https://huggingface.co/nvidia/PS3-4K-SigLIP) <br>
53
 
54
  ### Input: <br>
55
- **Input Type(s):** Image and text <br>
56
  **Input Format:** Red, Green, Blue (RGB) and strings <br>
57
- **Input Parameters:** 2D and 1D <br>
58
  **Other Properties Related to Input:** Image resolutions up to 3780*3780 and text input up to 12288 tokens <br>
59
 
60
  ### Output: <br>
61
  **Output Type(s):** Text <br>
62
  **Output Format:** Strings <br>
63
- **Output Parameters:** 1D <br>
64
  **Other Properties Related to Output:** Text output up to 12288 tokens <br>
65
 
66
  Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. <br>
67
 
68
  ## Software Integration:
69
  **Runtime Engine(s):**
70
- N/A <br>
71
 
72
  **Supported Hardware Microarchitecture Compatibility:** <br>
73
  NVIDIA Ampere <br>
74
  NVIDIA Blackwell <br>
75
- NVIDIA Jetson <br>
76
  NVIDIA Hopper <br>
 
77
 
78
  **Preferred/Supported Operating System(s):** <br>
79
  Linux <br>
@@ -93,6 +93,10 @@ v1.0 - Initial release
93
  |-----------------|----------------|-------------------------------------------------------------------------|
94
  | VILA-HD-8B-PS3-1.5K-SigLIP | 1512 * 1512 | [nvidia/VILA-HD-8B-PS3-1.5K-SigLIP](https://huggingface.co/nvidia/VILA-HD-8B-PS3-1.5K-SigLIP) |
95
  | VILA-HD-8B-PS3-4K-SigLIP | 3780 * 3780 | [nvidia/VILA-HD-8B-PS3-4K-SigLIP](https://huggingface.co/nvidia/VILA-HD-8B-PS3-4K-SigLIP) |
 
 
 
 
96
 
97
 
98
  ## Training Datasets: <br>
@@ -118,7 +122,10 @@ See [Dataset Preparation](https://arxiv.org/abs/2412.04468) for more details.
118
 
119
  ## Performance
120
 
121
- ![Performance of VILA-HD models](assets/vila_hd_results.png)
 
 
 
122
 
123
 
124
 
@@ -169,4 +176,5 @@ If you find this work useful in your research, please consider citing:
169
  journal={arXiv preprint arXiv:2503.19903},
170
  year={2025}
171
  }
172
- ```
 
 
19
 
20
  ### License/Terms of Use: <br>
21
 
22
+ [CC-BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
23
 
24
  ### Deployment Geography:
25
 
 
52
  **This model was developed based on [PS3-4K-SigLIP](https://huggingface.co/nvidia/PS3-4K-SigLIP) <br>
53
 
54
  ### Input: <br>
55
+ **Input Type(s):** Image and Text <br>
56
  **Input Format:** Red, Green, Blue (RGB) and strings <br>
57
+ **Input Parameters:** Two Dimensional (2D) and One Dimensional (1D) <br>
58
  **Other Properties Related to Input:** Image resolutions up to 3780*3780 and text input up to 12288 tokens <br>
59
 
60
  ### Output: <br>
61
  **Output Type(s):** Text <br>
62
  **Output Format:** Strings <br>
63
+ **Output Parameters:** One Dimensional (1D) <br>
64
  **Other Properties Related to Output:** Text output up to 12288 tokens <br>
65
 
66
  Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. <br>
67
 
68
  ## Software Integration:
69
  **Runtime Engine(s):**
70
+ Not Applicable (N/A) <br>
71
 
72
  **Supported Hardware Microarchitecture Compatibility:** <br>
73
  NVIDIA Ampere <br>
74
  NVIDIA Blackwell <br>
 
75
  NVIDIA Hopper <br>
76
+ NVIDIA Jetson <br>
77
 
78
  **Preferred/Supported Operating System(s):** <br>
79
  Linux <br>
 
93
  |-----------------|----------------|-------------------------------------------------------------------------|
94
  | VILA-HD-8B-PS3-1.5K-SigLIP | 1512 * 1512 | [nvidia/VILA-HD-8B-PS3-1.5K-SigLIP](https://huggingface.co/nvidia/VILA-HD-8B-PS3-1.5K-SigLIP) |
95
  | VILA-HD-8B-PS3-4K-SigLIP | 3780 * 3780 | [nvidia/VILA-HD-8B-PS3-4K-SigLIP](https://huggingface.co/nvidia/VILA-HD-8B-PS3-4K-SigLIP) |
96
+ | VILA-HD-8B-PS3-1.5K-C-RADIOv2 | 1536 * 1536 | [nvidia/VILA-HD-8B-PS3-1.5K-C-RADIOv2](https://huggingface.co/nvidia/VILA-HD-8B-PS3-1.5K-C-RADIOv2) |
97
+ | VILA-HD-8B-PS3-4K-C-RADIOv2 | 3840 * 3840 | [nvidia/VILA-HD-8B-PS3-4K-C-RADIOv2](https://huggingface.co/nvidia/VILA-HD-8B-PS3-4K-C-RADIOv2) |
98
+ | VILA-HD-8B-PS3-1.5K-SigLIP2 | 1512 * 1512 | [nvidia/VILA-HD-8B-PS3-1.5K-SigLIP2](https://huggingface.co/nvidia/VILA-HD-8B-PS3-1.5K-SigLIP2) |
99
+ | VILA-HD-8B-PS3-4K-SigLIP2 | 3780 * 3780 | [nvidia/VILA-HD-8B-PS3-4K-SigLIP2](https://huggingface.co/nvidia/VILA-HD-8B-PS3-4K-SigLIP2) |
100
 
101
 
102
  ## Training Datasets: <br>
 
122
 
123
  ## Performance
124
 
125
+ ![Performance of VILA-HD models 1](assets/vila_hd_results_1.png)
126
+
127
+ ![Performance of VILA-HD models 2](assets/vila_hd_results_2.png)
128
+
129
 
130
 
131
 
 
176
  journal={arXiv preprint arXiv:2503.19903},
177
  year={2025}
178
  }
179
+ ```
180
+