Duino
/

V-Do

Safetensors

Model card Files Files and versions

xet

Community

Duino commited on May 11

Commit

78551ee

verified ·

1 Parent(s): efb81cc

Create README.md

Browse files

Files changed (1) hide show

README.md +56 -0

README.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# V-Do
+This repository hosts a Vision-Language Model (VLM) trained using the nanoVLM library. This model is designed to understand and process both visual and textual information, making it capable of performing tasks such as Visual Question Answering (VQA).
+## Model Overview
+V-Do is built upon the nanoVLM architecture, which integrates key components for multimodal understanding:
+*   **Vision Encoder (ViT):** Processes input images to extract visual features.
+*   **Language Model (LM):** Handles textual input and generates text outputs.
+*   **Multimodal Projector (MP):** Bridges the gap between the visual and textual modalities, allowing the LM to incorporate visual context.
+The model weights are provided in the efficient [Safetensors](https://huggingface.co/docs/safetensors/index) format.
+## Repository Structure
+This repository is expected to contain the necessary files to load and use the VLM, including:
+*   Model weights (in safetensors format)
+*   Configuration files
+*   Other potentially necessary files for the nanoVLM library.
+## How to Use V-Do
+To use the V-Do model for inference, follow these steps:
+### 1. Clone the Repository
+First, clone this Hugging Face repository to your local machine or Colab environment:
+``` Python
+pip install torch datasets tqdm transformers accelerate -q
+```
+```` Python
+# Instantiate the VLMConfig, using the repo ID for loading
+# The from_pretrained method will automatically handle loading from the Hub
+vlm_cfg = VLMConfig(vlm_checkpoint_path="Duino/V-Do")
+# Load the model directly from the Hugging Face Hub
+try:
+    model = VisionLanguageModel.from_pretrained(vlm_cfg.vlm_checkpoint_path)
+    print(f"Successfully loaded model from {vlm_cfg.vlm_checkpoint_path}")
+except Exception as e:
+    print(f"Error loading model: {e}")
+# Load the tokenizer and image processor
+tokenizer = get_tokenizer(vlm_cfg.lm_tokenizer)
+image_processor = get_image_processor(vlm_cfg.vit_img_size)
+# Move model to the device (GPU if available)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+if model is not None:
+    model.to(device)
+print("\nModel, tokenizer, and image processor loaded successfully.")
+````