--- language: - en base_model: - Salesforce/blip-image-captioning-base pipeline_tag: image-to-text tags: - blip - icon-description - image-captioning license: mit library_name: transformers --- # 🧠 BLIP — UI Elements Captioning This model is a fine-tuned version of [`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base), adapted for **captioning UI elements** from macOS application screenshots. It is part of the **Screen2AX** research project focused on improving accessibility using vision-based deep learning. --- ## 🎯 Use Case The model takes an image of a **UI icon or element** and generates a **natural language description** (e.g., `"Settings icon"`, `"Play button"`, `"Search field"`). This helps build assistive technologies such as screen readers by providing textual labels for unlabeled visual components. --- ## 🏗 Model Architecture - Base model: [`Salesforce/blip-image-captioning-base`](https://huggingface.co/Salesforce/blip-image-captioning-base) - Architecture: **BLIP** (Bootstrapping Language-Image Pre-training) - Task: `image-to-text` --- ## 🖼 Example ```python from transformers import BlipProcessor, BlipForConditionalGeneration from PIL import Image import requests processor = BlipProcessor.from_pretrained("MacPaw/blip-icon-captioning") model = BlipForConditionalGeneration.from_pretrained("MacPaw/blip-icon-captioning") image = Image.open("path/to/ui_icon.png") inputs = processor(images=image, return_tensors="pt") output = model.generate(**inputs) caption = processor.decode(output[0], skip_special_tokens=True) print(caption) # Example: "Settings icon" ``` --- ## 📜 License This model is released under the **MIT License**. --- ## 🔗 Related Projects - [Screen2AX Project](https://github.com/MacPaw/Screen2AX) - [Screen2AX HuggingFace Collection](https://huggingface.co/collections/MacPaw/screen2ax-687dfe564d50f163020378b8) --- ## ✍️ Citation If you use this model in your research, please cite the Screen2AX paper: ```bibtex @misc{muryn2025screen2axvisionbasedapproachautomatic, title={Screen2AX: Vision-Based Approach for Automatic macOS Accessibility Generation}, author={Viktor Muryn and Marta Sumyk and Mariya Hirna and Sofiya Garkot and Maksym Shamrai}, year={2025}, eprint={2507.16704}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2507.16704}, } ``` --- ## 🌐 MacPaw Research Learn more at [https://research.macpaw.com](https://research.macpaw.com)