Instructions to use osunlp/Dreamer-7B-Reddit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use osunlp/Dreamer-7B-Reddit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="osunlp/Dreamer-7B-Reddit")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("osunlp/Dreamer-7B-Reddit")
model = AutoModelForImageTextToText.from_pretrained("osunlp/Dreamer-7B-Reddit")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use osunlp/Dreamer-7B-Reddit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "osunlp/Dreamer-7B-Reddit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "osunlp/Dreamer-7B-Reddit",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/osunlp/Dreamer-7B-Reddit

SGLang

How to use osunlp/Dreamer-7B-Reddit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "osunlp/Dreamer-7B-Reddit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "osunlp/Dreamer-7B-Reddit",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "osunlp/Dreamer-7B-Reddit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "osunlp/Dreamer-7B-Reddit",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use osunlp/Dreamer-7B-Reddit with Docker Model Runner:
```
docker model run hf.co/osunlp/Dreamer-7B-Reddit
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

WebDreamer: Model-Based Planning for Web Agents

WebDreamer is a planning framework that enables efficient and effective planning for real-world web agent tasks. Check our paper for more details. This work is a collaboration between OSUNLP and Orby AI.

Repository: https://github.com/OSU-NLP-Group/WebDreamer
Paper: https://arxiv.org/abs/2411.06559
Point of Contact: Kai Zhang

Models

Dreamer-7B:

Data:

Dreamer Training Data

root
 |-- prompt: string
 |-- image: binary
 |-- response: string
 |-- action: string

Results

Strong performance on VisualWebArena and Mind2Web-live

Benchmark	Method	Success Rate
VisualWebArena	GPT-4o + Reactive	17.6%
	GPT-4o + Tree Search	26.2%
	GPT-4o + WebDreamer	23.6% (↑34.1%)
Online-Mind2Web	GPT-4o + Reactive	26.0%
	GPT-4o + WebDreamer	37.0% (↑42.3%)
Mind2Web-live	GPT-4o + Reactive	20.2%
	GPT-4o + WebDreamer	25.0% (↑23.8%)

Compared to the reactive baselines, WebDreamer significantly improves performance by 34.1%, 42.3%, and 23.8% on VisualWebArena, Online-Mind2Web, and Mind2Web-live, respectively.

Better efficiency than tree search with true interactions

WebDreamer effectively explores the search space through simulations, which largely reduces the reliance on real-world interactions while maintaining robust performance.

Inference

vLLM server

vllm serve osunlp/Dreamer-7B --api-key token-abc123 --dtype float16

python -m vllm.entrypoints.openai.api_server --served-model-name osunlp/Dreamer-7B --model osunlp/Dreamer-7B --dtype float16

You can find more instruction about training and inference in Qwen2-VL's Official Repo.

Prompt

Actually our model is quite robust to textual prompt so feel free to try various prompts which we didn't heavily explore.

def format_openai_template(description: str, base64_image):
    return [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
                {
                    "type": "text",
                    "text": f"""
  Below is current screenshot. Please describe what you would see after a {action_description}"""
                },
            ],
        },
    ]


messages = format_openai_template(description, base64_image)

completion = await client.chat.completions.create(
    model=args.model_path,
    messages=messages,
    temperature=1.0
)

Citation Information

If you find this work useful, please consider citing our papers:

@article{Gu2024WebDreamer,
  author    = {Yu Gu and Kai Zhang and Yuting Ning and Boyuan Zheng and Boyu Gou and Tianci Xue and Cheng Chang and Sanjari Srivastava and Yanan Xie and Peng Qi and Huan Sun and Yu Su},
  title     = {Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents},
  journal   = {CoRR},
  volume    = {abs/2411.06559},
  year      = {2024},
  url       = {https://arxiv.org/abs/2411.06559},
  eprinttype= {arXiv},
  eprint    = {2411.06559},
}