Upload Physics ViT model

Browse files

Files changed (4) hide show

README.md +183 -0
config.json +24 -0
model.safetensors +3 -0
preprocessor_config.json +35 -0

README.md ADDED Viewed

	@@ -0,0 +1,183 @@

+# Physics Foundation Vision Transformer (PhysicsViT-StandardVersion)
+A Vision Transformer model trained on multi-physics simulation data for scientific computing applications. This model is specifically designed for understanding and analyzing physics simulations across multiple domains.
+**Model Version:** Standard Version - Trained for ~1.2 epochs (78,372 steps)
+## Model Details
+### Model Description
+- **Developed by:** PhysicsAlchemists Research Team
+- **Model type:** Vision Transformer (ViT-Huge)
+- **Language(s):** N/A (Computer Vision)
+- **License:** Apache 2.0
+- **Finetuned from model:** Trained from scratch on physics simulation data
+- **Training Steps:** 78,372 steps
+### Model Architecture
+- **Architecture:** ViT-Huge (Feature Extraction)
+- **Hidden size:** 1280
+- **Number of layers:** 32
+- **Number of attention heads:** 16
+- **Intermediate size:** 5120
+- **Image size:** 224×224
+- **Patch size:** 16×16
+- **Embedding dimension:** 1280
+## Training Details
+### Training Data
+The model was trained on a comprehensive dataset of physics simulations including:
+- Acoustic scattering (inclusions, discontinuous, maze)
+- Active matter simulations
+- Euler equations (multi-quadrants with open/periodic BC)
+- Gray-Scott reaction-diffusion
+- Helmholtz staircase
+- Planetary shallow water equations
+- Rayleigh-Bénard convection (standard and uniform)
+- Shear flow dynamics
+- Turbulent radiative layer (2D)
+- Viscoelastic instability
+### Training Configuration
+- **Training regime:** ~1.2 epochs (78,372 steps)
+- **Batch size:** 1,470
+- **Learning rate:** 0.0005 (with warmup and cosine decay)
+- **Optimizer:** Adam (β₁=0.9, β₂=0.999, weight_decay=0.0003)
+- **Mixed precision:** bfloat16
+- **Hardware:** Cerebras CS-X systems
+### Data Augmentation
+- Random colormap application (viridis, plasma, inferno, coolwarm)
+- Grayscale conversion (30% probability)
+- Temporal trajectory preservation during training
+## Usage
+⚠️ **Important:** This model requires specific preprocessing that differs from standard ViT models.
+### Basic Usage
+```python
+from transformers import AutoModel, AutoImageProcessor
+from PIL import Image
+import torch
+# Load model and processor
+model = AutoModel.from_pretrained("your-username/physics-vit-standard")
+processor = AutoImageProcessor.from_pretrained("your-username/physics-vit-standard")
+# Load your physics image
+image = Image.open("physics_simulation.png").convert('RGB')
+# ⚠️ CRITICAL: Apply custom preprocessing
+image = expand_to_square(image, background_color=(128, 128, 128))
+image = image.resize((224, 224), Image.BILINEAR)
+# Convert to tensor and add batch dimension
+from torchvision import transforms
+tensor = transforms.ToTensor()(image).unsqueeze(0)
+# Extract physics-aware embeddings
+with torch.no_grad():
+    outputs = model(pixel_values=tensor)
+    # CLS token embedding (best for classification tasks)
+    cls_embedding = outputs.last_hidden_state[:, 0, :]  # Shape: [1, 1280]
+    # Average pooled embedding (good for trajectory prediction)
+    pooled_embedding = outputs.last_hidden_state.mean(dim=1)  # Shape: [1, 1280]
+    # Patch embeddings (for spatial analysis)
+    patch_embeddings = outputs.last_hidden_state[:, 1:, :]  # Shape: [1, 196, 1280]
+print(f"CLS embedding shape: {cls_embedding.shape}")
+```
+### Required Preprocessing Function
+```python
+from PIL import Image
+def expand_to_square(pil_img, background_color):
+    """
+    Pad image to square with background color, keeping image centered.
+    REQUIRED for Physics ViT - this preprocessing was used during training.
+    """
+    background_color = tuple(background_color)
+    width, height = pil_img.size
+    if width == height:
+        return pil_img
+    elif width > height:
+        result = Image.new(pil_img.mode, (width, width), background_color)
+        result.paste(pil_img, (0, (width - height) // 2))
+        return result
+    else:
+        result = Image.new(pil_img.mode, (height, height), background_color)
+        result.paste(pil_img, ((height - width) // 2, 0))
+        return result
+```
+### Downstream Tasks
+This model produces rich 1280-dimensional embeddings optimized for:
+- **Physics Domain Classification:** Use CLS token embeddings
+- **Temporal Forecasting:** Use pooled embeddings for trajectory prediction
+- **Clustering & Similarity:** Use CLS or pooled embeddings
+- **Spatial Analysis:** Use patch embeddings
+- **Transfer Learning:** Fine-tune embeddings for new physics domains
+## Performance
+The model has been evaluated against DINO v2 and CLIP on physics-specific tasks:
+- **Classification:** Superior performance on physics domain classification
+- **Temporal Forecasting:** Better prediction of physics evolution
+- **Clustering:** Clearer separation of physics simulation types
+- **Transfer Learning:** Robust features for new physics applications
+*Detailed benchmarks available in the original research.*
+## Model Versions
+- **Standard Version:** 78,372 training steps (~1.2 epochs) - Good balance of performance and training efficiency
+- **Extended Version:** 195,930 training steps (3 full epochs) - Maximum performance, longer training
+## Installation
+```bash
+pip install transformers torch torchvision pillow
+```
+## Limitations
+- **Domain Specific:** Optimized for physics simulations, may not generalize to natural images
+- **Preprocessing Required:** Must use expand_to_square preprocessing for correct results
+- **Resolution:** Optimized for 224×224 input images
+- **Physics Domains:** Trained on specific simulation types listed above
+## Citation
+```bibtex
+@misc{physics-vit-2024,
+  title={Physics Foundation Vision Transformer for Scientific Computing},
+  author={PhysicsAlchemists Research Team},
+  year={2024},
+  howpublished={HuggingFace Model Hub},
+  url={https://huggingface.co/your-username/physics-vit-standard}
+}
+```
+## Acknowledgments
+- Built using [Cerebras ModelZoo](https://github.com/Cerebras/modelzoo)
+- Trained on Cerebras CS-X systems
+- Based on Vision Transformer architecture

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "ViTModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "encoder_stride": 16,
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1280,
+  "image_size": 224,
+  "initializer_range": 0.02,
+  "intermediate_size": 5120,
+  "layer_norm_eps": 1e-06,
+  "model_type": "vit",
+  "num_attention_heads": 16,
+  "num_channels": 3,
+  "num_hidden_layers": 32,
+  "patch_size": 16,
+  "qkv_bias": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.36.0",
+  "use_cache": true,
+  "_name_or_path": "physics-vit",
+  "problem_type": "single_label_classification"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c27213c7240ecd9edf4bd9b218cad8dcc3dbc365f38982f9e4d4a57da310cf81
+size 2523797432

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "do_normalize": false,
+  "do_rescale": false,
+  "do_resize": true,
+  "image_mean": null,
+  "image_std": null,
+  "resample": 2,
+  "rescale_factor": null,
+  "size": {
+    "height": 224,
+    "width": 224
+  },
+  "image_processor_type": "ViTImageProcessor",
+  "crop_pct": null,
+  "do_center_crop": false,
+  "processor_class": "ViTImageProcessor",
+  "preprocessing_source": "cerebras_modelzoo",
+  "preprocessing_url": "https://github.com/Cerebras/modelzoo/blob/5e81c965c68fd0a7ed9154bd0a7ed9154bd0ae26381f01218cd/src/cerebras/modelzoo/data/vision/transforms.py#L360",
+  "custom_preprocessing": {
+    "expand_to_square": true,
+    "background_color": [
+      128,
+      128,
+      128
+    ],
+    "interpolation": "bilinear",
+    "antialias": true,
+    "transforms_sequence": [
+      "expand_to_square",
+      "resize",
+      "to_tensor"
+    ],
+    "note": "This model requires Cerebras ModelZoo preprocessing pipeline. Standard HuggingFace ViTImageProcessor will NOT work correctly without the expand_to_square preprocessing step."
+  }
+}