File size: 1,148 Bytes
b3c75a2
 
 
 
 
 
 
18e82b2
 
b3c75a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: mit
pipeline_tag: image-feature-extraction
---
## RICE-ViT-L Model Card


[[Github]](https://github.com/deepglint/MVT) [[Paper]](https://arxiv.org/abs/2507.20025) 

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6478679d7b370854241b2ad8/Yy09pOusaZ47LofJ27xox.jpeg)

## Installation

```shell
pip install torch transformers
git clone https://github.com/deepglint/unicom
cd unicom/mlcd
```

## Usage

```python
from vit_rope2d_hf import MLCDVisionModel
from transformers import CLIPImageProcessor
from PIL import Image
import requests
import torch

# Load model and processor
model = MLCDVisionModel.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560")
processor = CLIPImageProcessor.from_pretrained("DeepGlint-AI/rice-vit-large-patch14-560")

# Process single image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

# Get visual features
with torch.no_grad():
    outputs = model(**inputs)
features = outputs.last_hidden_state

print(f"Extracted features shape: {features.shape}")
```