Gemma-3n-E2B-it Android Control LoRA Fine-tuned Model

Model Overview

This model is a fine-tuned version of Google's gemma-3n-E2B-it base model with LoRA adaptation for Android UI control tasks.

Training Data

  • Dataset: OfficerChul/Android-Control-84k
  • Data Format: Mobile UI screenshots paired with user instructions to perform appropriate actions (click, scroll, input, etc.)

Training Data Format Example

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that can identify what action to perform on mobile UI Screenshot given the user instruction."
    },
    {
      "role": "user",
      "content": "<image>Click on the Recording 2"
    },
    {
      "role": "assistant",
      "content": "{\"action_type\": \"click\", \"x\": 561, \"y\": 535}"
    }
  ],
  "images": ["and_ctrl/out_episode_18557_step_001.png"]
}

Training Method

LoRA fine-tuning performed using LLaMA-Factory framework

1. Training Configuration (gemma3n-e2b-it.yaml)

  • Base Model: google/gemma-3n-E2B-it
  • Training Method: LoRA (Low-Rank Adaptation)
  • LoRA Configuration:
    • Rank: 32
    • Target modules: q_proj, k_proj, v_proj, o_proj
  • Training Parameters:
    • Batch size: 4 (gradient accumulation: 48)
    • Learning rate: 2e-5
    • Epochs: 5
    • LR scheduler: Cosine
    • Optimizer: AdamW (fused)
    • Precision: bf16
  • Additional Settings:
    • Gradient checkpointing enabled
    • Vision tower, multi-modal projector, and language model all trainable
    • DeepSpeed ZeRO-2 utilized

2. Model Merging (gemma3n-e2b-it_lora_sft_merge.yaml)

Merged trained LoRA adapter with base model:

  • Base Model: google/gemma-3n-E2B-it

Supported Action Types

  • click: Click on specific coordinates
  • long_press: Long press action
  • scroll: Scroll (up/down)
  • input_text: Text input
  • navigate_back: Navigate back
  • navigate_home: Navigate to home screen
  • open_app: Open application
  • wait: Wait action

Usage

The merged model can be directly loaded using the Hugging Face Transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "OfficerChul/gemma-3n-E2B-it-Android-Control-84k"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True
)

Evaluation Results

Model Action Type Accuracy Click L2 Distance Input Text Match Scroll Direction Match Avg. Episode Accuracy Malformed JSON Execution Time (s) Inference Time (s)
Qwen/Qwen3-VL-30B-A3B-Instruct 0.9090 705.72 (n=812) 0.8264 (n=121) 0.3226 (n=248) 0.9063 110 (5.5%) 1101.43 485.12
Qwen/Qwen2.5-VL-7B-Instruct 0.6125 59.89 (n=544) 0.8197 (n=61) 0.3243 (n=111) 0.6163 499 (24.9%) 720.88 580.92
Qwen/Qwen2.5-VL-3B-Instruct 0.6645 88.21 (n=165) 0.7889 (n=90) 0.3519 (n=108) 0.6615 440 (22.0%) 676.76 536.27
OfficerChul/Qwen2.5-VL-7B-Instruct-Android-Control-5a 0.9970 427.30 (n=1466) 0.9434 (n=159) 0.9775 (n=267) 0.9974 0 (0.0%) 1086.97 581.82
OfficerChul/Qwen2.5-VL-3B-Instruct-Android-Control-5a 0.9965 446.54 (n=1467) 0.9363 (n=157) 0.9738 (n=267) 0.9976 1 (0.1%) 672.88 530.95
OfficerChul/InfiGUI-G1-7B-Android-Control-5a 0.9970 466.24 (n=1466) 0.9434 (n=159) 0.9775 (n=267) 0.9968 1 (0.1%) 897.58 552.23
OfficerChul/InfiGUI-G1-3B-Android-Control-5a 0.9980 449.73 (n=1467) 0.9625 (n=160) 0.9625 (n=267) 0.9983 0 (0.0%) 722.63 529.57
InfiX-ai/InfiGUI-G1-7B 0.6715 82.21 (n=821) 0.8000 (n=70) 0.2268 (n=194) 0.6763 457 (22.9%) 698.77 557.50
InfiX-ai/InfiGUI-G1-3B 0.8745 102.39 (n=1020) 0.7700 (n=100) 0.2299 (n=174) 0.8910 78 (3.9%) 702.93 559.65
OfficerChul/gemma-3n-E2B-it-Android-Control-84k 0.5819 985.82 (n=123) 0.8596 (n=114) 0.2159 (n=88) 0.5781 0 (0.0%) 322.95 159.23

License

Follows the license terms of the Google Gemma model.

Notes

  • This model was developed for research purposes in mobile UI automation and accessibility enhancement
  • Proper validation is required when using in production environments
Downloads last month
150
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OfficerChul/gemma-3n-E2B-it-Android-Control-84k

Adapter
(4)
this model

Dataset used to train OfficerChul/gemma-3n-E2B-it-Android-Control-84k