JessicaE commited on
Commit
f8f1d04
·
verified ·
1 Parent(s): 9c17fbe

Upload Physics ViT model

Browse files
Files changed (4) hide show
  1. README.md +183 -0
  2. config.json +24 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +35 -0
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Physics Foundation Vision Transformer (PhysicsViT-StandardVersion)
2
+
3
+ A Vision Transformer model trained on multi-physics simulation data for scientific computing applications. This model is specifically designed for understanding and analyzing physics simulations across multiple domains.
4
+
5
+ **Model Version:** Standard Version - Trained for ~1.2 epochs (78,372 steps)
6
+
7
+ ## Model Details
8
+
9
+ ### Model Description
10
+
11
+ - **Developed by:** PhysicsAlchemists Research Team
12
+ - **Model type:** Vision Transformer (ViT-Huge)
13
+ - **Language(s):** N/A (Computer Vision)
14
+ - **License:** Apache 2.0
15
+ - **Finetuned from model:** Trained from scratch on physics simulation data
16
+ - **Training Steps:** 78,372 steps
17
+
18
+ ### Model Architecture
19
+
20
+ - **Architecture:** ViT-Huge (Feature Extraction)
21
+ - **Hidden size:** 1280
22
+ - **Number of layers:** 32
23
+ - **Number of attention heads:** 16
24
+ - **Intermediate size:** 5120
25
+ - **Image size:** 224×224
26
+ - **Patch size:** 16×16
27
+ - **Embedding dimension:** 1280
28
+
29
+ ## Training Details
30
+
31
+ ### Training Data
32
+
33
+ The model was trained on a comprehensive dataset of physics simulations including:
34
+
35
+ - Acoustic scattering (inclusions, discontinuous, maze)
36
+ - Active matter simulations
37
+ - Euler equations (multi-quadrants with open/periodic BC)
38
+ - Gray-Scott reaction-diffusion
39
+ - Helmholtz staircase
40
+ - Planetary shallow water equations
41
+ - Rayleigh-Bénard convection (standard and uniform)
42
+ - Shear flow dynamics
43
+ - Turbulent radiative layer (2D)
44
+ - Viscoelastic instability
45
+
46
+ ### Training Configuration
47
+
48
+ - **Training regime:** ~1.2 epochs (78,372 steps)
49
+ - **Batch size:** 1,470
50
+ - **Learning rate:** 0.0005 (with warmup and cosine decay)
51
+ - **Optimizer:** Adam (β₁=0.9, β₂=0.999, weight_decay=0.0003)
52
+ - **Mixed precision:** bfloat16
53
+ - **Hardware:** Cerebras CS-X systems
54
+
55
+ ### Data Augmentation
56
+
57
+ - Random colormap application (viridis, plasma, inferno, coolwarm)
58
+ - Grayscale conversion (30% probability)
59
+ - Temporal trajectory preservation during training
60
+
61
+ ## Usage
62
+
63
+ ⚠️ **Important:** This model requires specific preprocessing that differs from standard ViT models.
64
+
65
+ ### Basic Usage
66
+
67
+ ```python
68
+ from transformers import AutoModel, AutoImageProcessor
69
+ from PIL import Image
70
+ import torch
71
+
72
+ # Load model and processor
73
+ model = AutoModel.from_pretrained("your-username/physics-vit-standard")
74
+ processor = AutoImageProcessor.from_pretrained("your-username/physics-vit-standard")
75
+
76
+ # Load your physics image
77
+ image = Image.open("physics_simulation.png").convert('RGB')
78
+
79
+ # ⚠️ CRITICAL: Apply custom preprocessing
80
+ image = expand_to_square(image, background_color=(128, 128, 128))
81
+ image = image.resize((224, 224), Image.BILINEAR)
82
+
83
+ # Convert to tensor and add batch dimension
84
+ from torchvision import transforms
85
+ tensor = transforms.ToTensor()(image).unsqueeze(0)
86
+
87
+ # Extract physics-aware embeddings
88
+ with torch.no_grad():
89
+ outputs = model(pixel_values=tensor)
90
+
91
+ # CLS token embedding (best for classification tasks)
92
+ cls_embedding = outputs.last_hidden_state[:, 0, :] # Shape: [1, 1280]
93
+
94
+ # Average pooled embedding (good for trajectory prediction)
95
+ pooled_embedding = outputs.last_hidden_state.mean(dim=1) # Shape: [1, 1280]
96
+
97
+ # Patch embeddings (for spatial analysis)
98
+ patch_embeddings = outputs.last_hidden_state[:, 1:, :] # Shape: [1, 196, 1280]
99
+
100
+ print(f"CLS embedding shape: {cls_embedding.shape}")
101
+ ```
102
+
103
+ ### Required Preprocessing Function
104
+
105
+ ```python
106
+ from PIL import Image
107
+
108
+ def expand_to_square(pil_img, background_color):
109
+ """
110
+ Pad image to square with background color, keeping image centered.
111
+
112
+ REQUIRED for Physics ViT - this preprocessing was used during training.
113
+ """
114
+ background_color = tuple(background_color)
115
+ width, height = pil_img.size
116
+ if width == height:
117
+ return pil_img
118
+ elif width > height:
119
+ result = Image.new(pil_img.mode, (width, width), background_color)
120
+ result.paste(pil_img, (0, (width - height) // 2))
121
+ return result
122
+ else:
123
+ result = Image.new(pil_img.mode, (height, height), background_color)
124
+ result.paste(pil_img, ((height - width) // 2, 0))
125
+ return result
126
+ ```
127
+
128
+ ### Downstream Tasks
129
+
130
+ This model produces rich 1280-dimensional embeddings optimized for:
131
+
132
+ - **Physics Domain Classification:** Use CLS token embeddings
133
+ - **Temporal Forecasting:** Use pooled embeddings for trajectory prediction
134
+ - **Clustering & Similarity:** Use CLS or pooled embeddings
135
+ - **Spatial Analysis:** Use patch embeddings
136
+ - **Transfer Learning:** Fine-tune embeddings for new physics domains
137
+
138
+ ## Performance
139
+
140
+ The model has been evaluated against DINO v2 and CLIP on physics-specific tasks:
141
+
142
+ - **Classification:** Superior performance on physics domain classification
143
+ - **Temporal Forecasting:** Better prediction of physics evolution
144
+ - **Clustering:** Clearer separation of physics simulation types
145
+ - **Transfer Learning:** Robust features for new physics applications
146
+
147
+ *Detailed benchmarks available in the original research.*
148
+
149
+ ## Model Versions
150
+
151
+ - **Standard Version:** 78,372 training steps (~1.2 epochs) - Good balance of performance and training efficiency
152
+ - **Extended Version:** 195,930 training steps (3 full epochs) - Maximum performance, longer training
153
+
154
+ ## Installation
155
+
156
+ ```bash
157
+ pip install transformers torch torchvision pillow
158
+ ```
159
+
160
+ ## Limitations
161
+
162
+ - **Domain Specific:** Optimized for physics simulations, may not generalize to natural images
163
+ - **Preprocessing Required:** Must use expand_to_square preprocessing for correct results
164
+ - **Resolution:** Optimized for 224×224 input images
165
+ - **Physics Domains:** Trained on specific simulation types listed above
166
+
167
+ ## Citation
168
+
169
+ ```bibtex
170
+ @misc{physics-vit-2024,
171
+ title={Physics Foundation Vision Transformer for Scientific Computing},
172
+ author={PhysicsAlchemists Research Team},
173
+ year={2024},
174
+ howpublished={HuggingFace Model Hub},
175
+ url={https://huggingface.co/your-username/physics-vit-standard}
176
+ }
177
+ ```
178
+
179
+ ## Acknowledgments
180
+
181
+ - Built using [Cerebras ModelZoo](https://github.com/Cerebras/modelzoo)
182
+ - Trained on Cerebras CS-X systems
183
+ - Based on Vision Transformer architecture
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ViTModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "encoder_stride": 16,
7
+ "hidden_dropout_prob": 0.1,
8
+ "hidden_size": 1280,
9
+ "image_size": 224,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 5120,
12
+ "layer_norm_eps": 1e-06,
13
+ "model_type": "vit",
14
+ "num_attention_heads": 16,
15
+ "num_channels": 3,
16
+ "num_hidden_layers": 32,
17
+ "patch_size": 16,
18
+ "qkv_bias": true,
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.36.0",
21
+ "use_cache": true,
22
+ "_name_or_path": "physics-vit",
23
+ "problem_type": "single_label_classification"
24
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c27213c7240ecd9edf4bd9b218cad8dcc3dbc365f38982f9e4d4a57da310cf81
3
+ size 2523797432
preprocessor_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": false,
3
+ "do_rescale": false,
4
+ "do_resize": true,
5
+ "image_mean": null,
6
+ "image_std": null,
7
+ "resample": 2,
8
+ "rescale_factor": null,
9
+ "size": {
10
+ "height": 224,
11
+ "width": 224
12
+ },
13
+ "image_processor_type": "ViTImageProcessor",
14
+ "crop_pct": null,
15
+ "do_center_crop": false,
16
+ "processor_class": "ViTImageProcessor",
17
+ "preprocessing_source": "cerebras_modelzoo",
18
+ "preprocessing_url": "https://github.com/Cerebras/modelzoo/blob/5e81c965c68fd0a7ed9154bd0a7ed9154bd0ae26381f01218cd/src/cerebras/modelzoo/data/vision/transforms.py#L360",
19
+ "custom_preprocessing": {
20
+ "expand_to_square": true,
21
+ "background_color": [
22
+ 128,
23
+ 128,
24
+ 128
25
+ ],
26
+ "interpolation": "bilinear",
27
+ "antialias": true,
28
+ "transforms_sequence": [
29
+ "expand_to_square",
30
+ "resize",
31
+ "to_tensor"
32
+ ],
33
+ "note": "This model requires Cerebras ModelZoo preprocessing pipeline. Standard HuggingFace ViTImageProcessor will NOT work correctly without the expand_to_square preprocessing step."
34
+ }
35
+ }