Instructions to use kaihon/sg-aerial-scene-analyser-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use kaihon/sg-aerial-scene-analyser-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct") model = PeftModel.from_pretrained(base_model, "kaihon/sg-aerial-scene-analyser-lora") - Transformers
How to use kaihon/sg-aerial-scene-analyser-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="kaihon/sg-aerial-scene-analyser-lora") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("kaihon/sg-aerial-scene-analyser-lora", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kaihon/sg-aerial-scene-analyser-lora with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kaihon/sg-aerial-scene-analyser-lora" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kaihon/sg-aerial-scene-analyser-lora", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/kaihon/sg-aerial-scene-analyser-lora
- SGLang
How to use kaihon/sg-aerial-scene-analyser-lora with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kaihon/sg-aerial-scene-analyser-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kaihon/sg-aerial-scene-analyser-lora", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kaihon/sg-aerial-scene-analyser-lora" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kaihon/sg-aerial-scene-analyser-lora", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use kaihon/sg-aerial-scene-analyser-lora with Docker Model Runner:
docker model run hf.co/kaihon/sg-aerial-scene-analyser-lora
SG Aerial Scene Analyser (LoRA)
QLoRA adapter for Qwen2.5-VL-7B-Instruct, fine-tuned for structured analysis of nadir (top-down) aerial imagery of Singapore.
Demo: kaihon/sg-aerial-scene-analyser
Usage
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(model, "kaihon/sg-aerial-scene-analyser-lora")
model.eval()
processor = AutoProcessor.from_pretrained(
"Qwen/Qwen2.5-VL-7B-Instruct", min_pixels=256*28*28, max_pixels=1280*28*28
)
messages = [
{"role": "system", "content": "You are an aerial scene analyst specialising in Singapore urban landscapes. Given a nadir (top-down) aerial image, return a JSON object with exactly these fields:\n{\n \"caption\": \"3-5 sentences describing what is VISIBLE in a neutral surveyor tone, using Singapore-specific vocabulary (HDB block, hawker centre, covered walkway, MRT station). Name types not instances (MRT station not Bishan MRT). Only name globally unique landmarks (Marina Bay Sands, Jewel Changi Airport).\",\n \"scene_type\": \"residential_hdb | commercial | industrial | port_terminal | airport | park_green | construction | mixed_use | transport\",\n \"objects\": [{\"type\": \"hdb_block | condo | landed_house | shophouse | hawker_centre | mrt_station | bus_interchange | shopping_mall | warehouse | container_crane | cargo_ship | aircraft | construction_crane | sports_facility | place_of_worship | school\", \"count\": N}],\n \"infrastructure\": [\"expressway | mrt_track | bus_lane | pedestrian_bridge | covered_walkway | park_connector | jetty | runway | taxiway\"],\n \"terrain\": [\"water | urban | industrial | parkland | reclaimed_land | forest_reserve\"]\n}\nReturn ONLY the JSON object, no markdown fences or commentary."},
{"role": "user", "content": [
{"type": "image", "image": "path/to/aerial_image.jpg"},
{"type": "text", "text": "Analyse this nadir aerial image of Singapore."},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
img_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=img_inputs, padding=True, return_tensors="pt").to(model.device)
with torch.no_grad():
gen = model.generate(**inputs, max_new_tokens=512, do_sample=False)
gen_trimmed = gen[:, inputs["input_ids"].shape[1]:]
output = processor.batch_decode(gen_trimmed, skip_special_tokens=True)[0]
print(output)
Output Format
The model returns a JSON object with 5 fields:
| Field | Type | Description |
|---|---|---|
caption |
string | 3-5 sentence description using Singapore-specific vocabulary |
scene_type |
string | One of 9 categories: residential_hdb, commercial, industrial, port_terminal, airport, park_green, construction, mixed_use, transport |
objects |
array | Detected objects with type and count |
infrastructure |
array | Infrastructure elements (e.g. expressway, mrt_track, covered_walkway) |
terrain |
array | Terrain types (e.g. urban, water, parkland) |
Training
| Parameter | Value |
|---|---|
| Base model | Qwen2.5-VL-7B-Instruct |
| Method | QLoRA (rank 8, alpha 16, dropout 0.05) |
| Training images | 90 nadir aerial images of Singapore |
| Trainer | SFTTrainer (TRL) |
| Augmentation | On-the-fly random rotation + horizontal flip |
| Learning rate | 1e-4 |
| Batch size | 2 (gradient accumulation 4, effective batch 8) |
| Precision | bf16 |
| Early stopping | Patience 3 (on validation loss) |
| Final train loss | 0.721 |
| Hardware | NVIDIA L4 (23.7 GB), ~94 minutes |
Evaluation (17-sample held-out test set)
| Metric | Baseline (Qwen2.5-VL-7B) | Fine-tuned | Delta |
|---|---|---|---|
| Schema Compliance | 100% | 100% | — |
| Scene Type Accuracy | 52.9% | 70.6% | +17.7% |
| ROUGE-1 F1 | 0.325 | 0.536 | +0.211 |
| ROUGE-2 F1 | 0.040 | 0.268 | +0.228 |
| ROUGE-L F1 | 0.193 | 0.402 | +0.209 |
| BERTScore F1 | 0.875 | 0.917 | +0.042 |
| Object Mention F1 | 0.309 | 0.471 | +0.161 |
Fine-tuning improves scene classification by +17.7%, doubles caption quality (ROUGE-L +0.21), and boosts object detection (F1 +0.16). Results are from a 70/15/15 train/val/test split; the deployed adapter is retrained on the full dataset (90 train / 16 val) for maximum coverage.
Framework Versions
- PEFT 0.18.1
- Transformers >= 4.49
- TRL (SFTTrainer)
- PyTorch (bf16)
- Downloads last month
- 6
Model tree for kaihon/sg-aerial-scene-analyser-lora
Base model
Qwen/Qwen2.5-VL-7B-Instruct