Inference with RZNV-1.5-3B-Instruct (PEFT Adapter)

This repository contains only the Parameter-Efficient Fine-Tuning (PEFT) adapter weights for the Qwen2.5-VL-3B-Instruct model. This approach keeps the model highly portable and lightweight for sharing!

Important Note: Adapter Loading Required

We experienced issues during development where using the standard merge_and_unload() function resulted in the model incorrectly reverting to the base model's original performance.

Therefore, to access the fine-tuned performance, you MUST load the original base model first and then explicitly attach these adapter weights using the peft library, as demonstrated in the setup steps below.


Model and Adapter Details

Detail Value
Base Model ID Qwen/Qwen2.5-VL-3B-Instruct
Adapter Type PEFT (e.g., LoRA)
Adapter Repository ID phronetic-ai/RZNV-1.5-3B-Instruct

Running Inference

Step 1: Installation

Ensure you have the necessary libraries installed, including peft and transformers.

pip install transformers peft accelerate torch
# You may also need to install the Qwen-VL-specific utilities (qwen_vl_utils)
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info # Required for Qwen-VL multi-modal processing

# --- Define Paths ---
BASE_MODEL_ID = "Qwen/Qwen2.5-VL-3B-Instruct"
ADAPTER_REPO_ID = "phronetic-ai/RZNV-1.5-3B-Instruct" 

# 1. Load the base model (Ensure you use the same precision/device_map as during training)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    BASE_MODEL_ID, 
    torch_dtype="auto", 
    device_map="auto"
)

# Optional: Enable flash_attention_2 if your hardware supports it for better speed/memory
# model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
#     BASE_MODEL_ID,
#     torch_dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )

# 2. Load the processor (Tokenizer + Feature Extractor) from the base model
processor = AutoProcessor.from_pretrained(BASE_MODEL_ID)

# 3. Load and attach the PEFT adapter weights! This is the most important step.
# The 'model' object is updated in-place to include the fine-tuned weights.
model = PeftModel.from_pretrained(model, ADAPTER_REPO_ID)

Run Generation

# Example multi-modal input
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "[https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg)",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages) # Qwen-VL specific
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to(model.device) # Move inputs to the model's device

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)

print(output_text)
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phronetic-ai/RZNV-1.5-3B-Instruct

Adapter
(70)
this model

Collection including phronetic-ai/RZNV-1.5-3B-Instruct