OpenIXCLab
/

SeC-4B

@@ -1,11 +1,12 @@
 ---
-license: apache-2.0
-pipeline_tag: mask-generation
 base_model:
-  - OpenGVLab/InternVL2.5-4B
-  - facebook/sam2.1-hiera-large
 tags:
-  - SeC
 ---
 # SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
@@ -31,6 +32,41 @@ tags:
 | **SeC (Ours)** | **82.7**  | **81.7** | **86.5**  | **75.3**  | **91.3** | **88.6** | **70.0** |
 ---
 ## Citation
 If you find this project useful in your research, please consider citing:

 ---
 base_model:
+- OpenGVLab/InternVL2.5-4B
+- facebook/sam2.1-hiera-large
+license: apache-2.0
+pipeline_tag: image-segmentation
 tags:
+- SeC
+library_name: transformers
 ---
 # SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
 | **SeC (Ours)** | **82.7**  | **81.7** | **86.5**  | **75.3**  | **91.3** | **88.6** | **70.0** |
 ---
+## Usage
+You can load the SeC model and processor using the `transformers` library with `trust_remote_code=True`. For comprehensive video object segmentation and detailed usage instructions, please refer to the project's [GitHub repository](https://github.com/OpenIXCLab/SeC), particularly `demo.ipynb` for single video inference and `INFERENCE.md` for full inference and evaluation.
+```python
+import torch
+from transformers import AutoModel, AutoProcessor
+from PIL import Image
+# Load model and processor
+model_name = "OpenIXCLab/SeC-4B"
+# Ensure your environment has the necessary PyTorch and transformers versions as specified in the GitHub repo.
+model = AutoModel.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
+processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
+# Example: Assuming you have an image (e.g., a frame from a video) and a text query
+# For full video processing, refer to the project's GitHub repository.
+# Placeholder for an actual image path
+# image = Image.open("path/to/your/image.jpg").convert("RGB")
+# text_query = "segment the main object"
+# # Prepare inputs
+# inputs = processor(images=image, text=text_query, return_tensors="pt").to(model.device)
+# # Perform inference
+# with torch.no_grad():
+#     outputs = model(**inputs)
+# The output format will vary depending on the model's implementation.
+# Typically, for segmentation tasks, outputs might include logits or predicted masks.
+# You will need to process these outputs further to visualize the segmentation.
+print("Model loaded successfully. For actual inference with video data, please refer to the project's GitHub repository and demo.ipynb.")
+```
 ## Citation
 If you find this project useful in your research, please consider citing: