File size: 1,148 Bytes
b3c75a2 18e82b2 b3c75a2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
---
license: mit
pipeline_tag: image-feature-extraction
---
## RICE-ViT-L Model Card
[[Github]](https://github.com/deepglint/MVT) [[Paper]](https://arxiv.org/abs/2507.20025)

## Installation
```shell
pip install torch transformers
git clone https://github.com/deepglint/unicom
cd unicom/mlcd
```
## Usage
```python
from vit_rope2d_hf import MLCDVisionModel
from transformers import CLIPImageProcessor
from PIL import Image
import requests
import torch
# Load model and processor
model = MLCDVisionModel.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560")
processor = CLIPImageProcessor.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560")
# Process single image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")
# Get visual features
with torch.no_grad():
outputs = model(**inputs)
features = outputs.last_hidden_state
print(f"Extracted features shape: {features.shape}")
```
|