SeC-4B / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Update pipeline tag, add library name, and usage example
792a079 verified
|
raw
history blame
3.77 kB
metadata
base_model:
  - OpenGVLab/InternVL2.5-4B
  - facebook/sam2.1-hiera-large
license: apache-2.0
pipeline_tag: image-segmentation
tags:
  - SeC
library_name: transformers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

[๐Ÿ“‚ GitHub] [๐Ÿ“ฆ Benchmark] [๐ŸŒ Homepage] [๐Ÿ“„ Paper]

Highlights

  • ๐Ÿ”ฅWe introduce Segment Concept (SeC), a concept-driven segmentation framework for video object segmentation that integrates Large Vision-Language Models (LVLMs) for robust, object-centric representations.
  • ๐Ÿ”ฅSeC dynamically balances semantic reasoning with feature matching, adaptively adjusting computational efforts based on scene complexity for optimal segmentation performance.
  • ๐Ÿ”ฅWe propose the Semantic Complex Scenarios Video Object Segmentation (SeCVOS) benchmark, designed to evaluate segmentation in challenging scenarios.

SeC Performance

Model SA-V val SA-V test LVOS v2 val MOSE val DAVIS 2017 val YTVOS 2019 val SeCVOS
SAM 2.1 78.6 79.6 84.1 74.5 90.6 88.7 58.2
SAMURAI 79.8 80.0 84.2 72.6 89.9 88.3 62.2
SAM2.1Long 81.1 81.2 85.9 75.2 91.4 88.7 62.3
SeC (Ours) 82.7 81.7 86.5 75.3 91.3 88.6 70.0

Usage

You can load the SeC model and processor using the transformers library with trust_remote_code=True. For comprehensive video object segmentation and detailed usage instructions, please refer to the project's GitHub repository, particularly demo.ipynb for single video inference and INFERENCE.md for full inference and evaluation.

import torch
from transformers import AutoModel, AutoProcessor
from PIL import Image

# Load model and processor
model_name = "OpenIXCLab/SeC-4B"
# Ensure your environment has the necessary PyTorch and transformers versions as specified in the GitHub repo.
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)

# Example: Assuming you have an image (e.g., a frame from a video) and a text query
# For full video processing, refer to the project's GitHub repository.
# Placeholder for an actual image path
# image = Image.open("path/to/your/image.jpg").convert("RGB")
# text_query = "segment the main object"

# # Prepare inputs
# inputs = processor(images=image, text=text_query, return_tensors="pt").to(model.device)

# # Perform inference
# with torch.no_grad():
#     outputs = model(**inputs)

# The output format will vary depending on the model's implementation.
# Typically, for segmentation tasks, outputs might include logits or predicted masks.
# You will need to process these outputs further to visualize the segmentation.
print("Model loaded successfully. For actual inference with video data, please refer to the project's GitHub repository and demo.ipynb.")

Citation

If you find this project useful in your research, please consider citing:

@article{zhang2025sec,
  title     = {SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction},
  author    = {Zhixiong Zhang and Shuangrui Ding and Xiaoyi Dong and Songxin He and Jianfan Lin and Junsong Tang and Yuhang Zang and Yuhang Cao and Dahua Lin and Jiaqi Wang},
  journal   = {arXiv preprint arXiv:2507.15852},
  year      = {2025}
}