Improve model card: Update pipeline tag, add library name, and usage example

This PR enhances the model card for the SeC model by:

- Updating the `pipeline_tag` from `mask-generation` to `image-segmentation` for more precise categorization of the Video Object Segmentation task. This will improve discoverability on the Hugging Face Hub.
- Adding `library_name: transformers` to correctly reflect the model's compatibility and usage with the Hugging Face Transformers library, enabling the "Use in Transformers" widget.
- Including a basic Python usage example to demonstrate how to load and interact with the model, making it easier for users to get started.

These changes will help users better understand the model's capabilities and how to use it within the Hugging Face ecosystem.

Files changed (1) hide show

README.md +41 -5

README.md CHANGED Viewed

@@ -1,11 +1,12 @@
 ---
-license: apache-2.0
-pipeline_tag: mask-generation
 base_model:
-  - OpenGVLab/InternVL2.5-4B
-  - facebook/sam2.1-hiera-large
 tags:
-  - SeC
 ---
 # SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
@@ -31,6 +32,41 @@ tags:
 | **SeC (Ours)** | **82.7**  | **81.7** | **86.5**  | **75.3**  | **91.3** | **88.6** | **70.0** |
 ---
 ## Citation
 If you find this project useful in your research, please consider citing:

 ---
 base_model:
+- OpenGVLab/InternVL2.5-4B
+- facebook/sam2.1-hiera-large
+license: apache-2.0
+pipeline_tag: image-segmentation
 tags:
+- SeC
+library_name: transformers
 ---
 # SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
 | **SeC (Ours)** | **82.7**  | **81.7** | **86.5**  | **75.3**  | **91.3** | **88.6** | **70.0** |
 ---
+## Usage
+You can load the SeC model and processor using the `transformers` library with `trust_remote_code=True`. For comprehensive video object segmentation and detailed usage instructions, please refer to the project's [GitHub repository](https://github.com/OpenIXCLab/SeC), particularly `demo.ipynb` for single video inference and `INFERENCE.md` for full inference and evaluation.
+```python
+import torch
+from transformers import AutoModel, AutoProcessor
+from PIL import Image
+# Load model and processor
+model_name = "OpenIXCLab/SeC-4B"
+# Ensure your environment has the necessary PyTorch and transformers versions as specified in the GitHub repo.
+model = AutoModel.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
+processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
+# Example: Assuming you have an image (e.g., a frame from a video) and a text query
+# For full video processing, refer to the project's GitHub repository.
+# Placeholder for an actual image path
+# image = Image.open("path/to/your/image.jpg").convert("RGB")
+# text_query = "segment the main object"
+# # Prepare inputs
+# inputs = processor(images=image, text=text_query, return_tensors="pt").to(model.device)
+# # Perform inference
+# with torch.no_grad():
+#     outputs = model(**inputs)
+# The output format will vary depending on the model's implementation.
+# Typically, for segmentation tasks, outputs might include logits or predicted masks.
+# You will need to process these outputs further to visualize the segmentation.
+print("Model loaded successfully. For actual inference with video data, please refer to the project's GitHub repository and demo.ipynb.")
+```
 ## Citation
 If you find this project useful in your research, please consider citing: