Gemma-3n-E2B-it Android Control LoRA Fine-tuned Model

Model Overview

This model is a fine-tuned version of Google's gemma-3n-E2B-it base model with LoRA adaptation for Android UI control tasks.

Training Data

Dataset: OfficerChul/Android-Control-84k
Data Format: Mobile UI screenshots paired with user instructions to perform appropriate actions (click, scroll, input, etc.)

Training Data Format Example

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that can identify what action to perform on mobile UI Screenshot given the user instruction."
    },
    {
      "role": "user",
      "content": "<image>Click on the Recording 2"
    },
    {
      "role": "assistant",
      "content": "{\"action_type\": \"click\", \"x\": 561, \"y\": 535}"
    }
  ],
  "images": ["and_ctrl/out_episode_18557_step_001.png"]
}

Training Method

LoRA fine-tuning performed using LLaMA-Factory framework

1. Training Configuration (`gemma3n-e2b-it.yaml`)

Base Model: google/gemma-3n-E2B-it
Training Method: LoRA (Low-Rank Adaptation)
LoRA Configuration:
- Rank: 32
- Target modules: q_proj, k_proj, v_proj, o_proj
Training Parameters:
- Batch size: 4 (gradient accumulation: 48)
- Learning rate: 2e-5
- Epochs: 5
- LR scheduler: Cosine
- Optimizer: AdamW (fused)
- Precision: bf16
Additional Settings:
- Gradient checkpointing enabled
- Vision tower, multi-modal projector, and language model all trainable
- DeepSpeed ZeRO-2 utilized

2. Model Merging (`gemma3n-e2b-it_lora_sft_merge.yaml`)

Merged trained LoRA adapter with base model:

Base Model: google/gemma-3n-E2B-it

Supported Action Types

click: Click on specific coordinates
long_press: Long press action
scroll: Scroll (up/down)
input_text: Text input
navigate_back: Navigate back
navigate_home: Navigate to home screen
open_app: Open application
wait: Wait action

Usage

The merged model can be directly loaded using the Hugging Face Transformers library.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "OfficerChul/gemma-3n-E2B-it-Android-Control-84k"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True
)

Evaluation Results

Model	Action Type Accuracy	Click L2 Distance	Input Text Match	Scroll Direction Match	Avg. Episode Accuracy	Malformed JSON	Execution Time (s)	Inference Time (s)
Qwen/Qwen3-VL-30B-A3B-Instruct	0.9090	705.72 (n=812)	0.8264 (n=121)	0.3226 (n=248)	0.9063	110 (5.5%)	1101.43	485.12
Qwen/Qwen2.5-VL-7B-Instruct	0.6125	59.89 (n=544)	0.8197 (n=61)	0.3243 (n=111)	0.6163	499 (24.9%)	720.88	580.92
Qwen/Qwen2.5-VL-3B-Instruct	0.6645	88.21 (n=165)	0.7889 (n=90)	0.3519 (n=108)	0.6615	440 (22.0%)	676.76	536.27
OfficerChul/Qwen2.5-VL-7B-Instruct-Android-Control-5a	0.9970	427.30 (n=1466)	0.9434 (n=159)	0.9775 (n=267)	0.9974	0 (0.0%)	1086.97	581.82
OfficerChul/Qwen2.5-VL-3B-Instruct-Android-Control-5a	0.9965	446.54 (n=1467)	0.9363 (n=157)	0.9738 (n=267)	0.9976	1 (0.1%)	672.88	530.95
OfficerChul/InfiGUI-G1-7B-Android-Control-5a	0.9970	466.24 (n=1466)	0.9434 (n=159)	0.9775 (n=267)	0.9968	1 (0.1%)	897.58	552.23
OfficerChul/InfiGUI-G1-3B-Android-Control-5a	0.9980	449.73 (n=1467)	0.9625 (n=160)	0.9625 (n=267)	0.9983	0 (0.0%)	722.63	529.57
InfiX-ai/InfiGUI-G1-7B	0.6715	82.21 (n=821)	0.8000 (n=70)	0.2268 (n=194)	0.6763	457 (22.9%)	698.77	557.50
InfiX-ai/InfiGUI-G1-3B	0.8745	102.39 (n=1020)	0.7700 (n=100)	0.2299 (n=174)	0.8910	78 (3.9%)	702.93	559.65
OfficerChul/gemma-3n-E2B-it-Android-Control-84k	0.5819	985.82 (n=123)	0.8596 (n=114)	0.2159 (n=88)	0.5781	0 (0.0%)	322.95	159.23

License

Follows the license terms of the Google Gemma model.

Notes

This model was developed for research purposes in mobile UI automation and accessibility enhancement
Proper validation is required when using in production environments

Downloads last month: 150

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for OfficerChul/gemma-3n-E2B-it-Android-Control-84k

Base model

google/gemma-3n-E4B

Finetuned

google/gemma-3n-E4B-it

Finetuned

google/gemma-3n-E2B-it

Adapter

(4)

this model

Dataset used to train OfficerChul/gemma-3n-E2B-it-Android-Control-84k

Evaluation results

Metadata error: specify a dataset to view leaderboard