Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,126 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π Math2Visual: Visual Language Generation Model
|
2 |
+
|
3 |
+
This is the official model for generating **structured visual language** representations from math word problems, as proposed in:
|
4 |
+
|
5 |
+
π **[ACL 2025 Paper β Math2Visual](https://arxiv.org/abs/2506.03735)**
|
6 |
+
π₯ **[Project Video](https://youtu.be/jdPYVoHEPtk)**
|
7 |
+
|
8 |
+
---
|
9 |
+
|
10 |
+
## β¨ Model Summary
|
11 |
+
|
12 |
+
This model takes a math word problem (MWP) and its equation (formula) as input and outputs a **visual language** string which is used for generating pedagogically meaningful visuals. The output follows a fixed structure based on teacher-informed design to describe key mathematical relationships between entities, containers, and operations.
|
13 |
+
|
14 |
+
It is built by fine-tuning `meta-llama/Llama-3.1-8B` with LoRA using [PEFT](https://github.com/huggingface/peft), optimized with 4-bit quantization (BitsAndBytes). The code for generating visuals using visual language can be found in our **[github repository](https://github.com/eth-lre/math2visual/tree/main)**
|
15 |
+
|
16 |
+
|
17 |
+
---
|
18 |
+
|
19 |
+
## π§ Example Use
|
20 |
+
|
21 |
+
### π§ Install dependencies
|
22 |
+
|
23 |
+
```bash
|
24 |
+
pip install torch==2.5.1+cu121 torchvision==0.20.1+cu121 torchaudio==2.5.1+cu121 \
|
25 |
+
bitsandbytes==0.45.0 inflect==7.3.1 lxml==5.3.0 ipython==8.25.0 python-dotenv==1.0.1 \
|
26 |
+
git+https://github.com/huggingface/transformers.git@5fa35344755d8d9c29610b57d175efd03776ae9e \
|
27 |
+
git+https://github.com/huggingface/peft.git@aa3f41f7529ed078e9225b2fc1edbb8c71f58f99
|
28 |
+
|
29 |
+
π‘ Use -f https://download.pytorch.org/whl/torch_stable.html for CUDA wheels if needed.
|
30 |
+
|
31 |
+
βΈ»
|
32 |
+
|
33 |
+
π Run Inference
|
34 |
+
|
35 |
+
import torch
|
36 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
|
37 |
+
from peft import PeftModel
|
38 |
+
|
39 |
+
# Load model
|
40 |
+
bnb_config = BitsAndBytesConfig(
|
41 |
+
load_in_4bit=True,
|
42 |
+
bnb_4bit_use_double_quant=True,
|
43 |
+
bnb_4bit_quant_type="nf4",
|
44 |
+
bnb_4bit_compute_dtype=torch.bfloat16
|
45 |
+
)
|
46 |
+
|
47 |
+
base_model_id = "meta-llama/Llama-3.1-8B"
|
48 |
+
adapter_dir = "junling24/Math2Visual-Visual_Language_Generation"
|
49 |
+
|
50 |
+
base = AutoModelForCausalLM.from_pretrained(
|
51 |
+
base_model_id,
|
52 |
+
quantization_config=bnb_config,
|
53 |
+
device_map="auto",
|
54 |
+
trust_remote_code=True
|
55 |
+
)
|
56 |
+
model = PeftModel.from_pretrained(base, adapter_dir)
|
57 |
+
model.eval()
|
58 |
+
model.config.use_cache = True
|
59 |
+
|
60 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
61 |
+
base_model_id,
|
62 |
+
padding_side="left",
|
63 |
+
add_eos_token=True,
|
64 |
+
add_bos_token=True,
|
65 |
+
trust_remote_code=True
|
66 |
+
)
|
67 |
+
tokenizer.pad_token = tokenizer.eos_token
|
68 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
69 |
+
model.to(device)
|
70 |
+
|
71 |
+
# Prompt
|
72 |
+
def create_prompt(mwp, formula=None):
|
73 |
+
return (
|
74 |
+
'''You are an expert at converting math story problem into a structured 'visual language'...'''
|
75 |
+
f"Question: {mwp}\n"
|
76 |
+
f"Formula: {formula}\n"
|
77 |
+
"Answer in visual language:"
|
78 |
+
)
|
79 |
+
|
80 |
+
mwp = "Janet has nine oranges, and Sharon has seven oranges. How many oranges do Janet and Sharon have together?"
|
81 |
+
formula = "9 + 7 = 16"
|
82 |
+
prompt = create_prompt(mwp, formula)
|
83 |
+
|
84 |
+
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048, padding="max_length").to(device)
|
85 |
+
|
86 |
+
with torch.no_grad():
|
87 |
+
outputs = model.generate(
|
88 |
+
**inputs,
|
89 |
+
max_new_tokens=2048,
|
90 |
+
do_sample=True,
|
91 |
+
temperature=0.7,
|
92 |
+
repetition_penalty=1.15
|
93 |
+
)
|
94 |
+
|
95 |
+
visual_language = tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
|
96 |
+
print("Generated Visual Language:\n", visual_language)
|
97 |
+
|
98 |
+
|
99 |
+
βΈ»
|
100 |
+
|
101 |
+
ποΈ Related Resources
|
102 |
+
β’ π Math2Visual Dataset on Hugging Face: **[link](https://huggingface.co/datasets/junling24/Math2Visual-Generating_Pedagogically_Meaningful_Visuals_for_Math_Word_Problems)**
|
103 |
+
β’ π» Codebase on GitHub: **[link](https://github.com/eth-lre/math2visual/tree/main).
|
104 |
+
|
105 |
+
βΈ»
|
106 |
+
|
107 |
+
π Citation
|
108 |
+
|
109 |
+
@inproceedings{wang2025math2visual,
|
110 |
+
title={Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models},
|
111 |
+
author={Wang, Junling and Rutkiewicz, Anna and Wang, April Yi and Sachan, Mrinmaya},
|
112 |
+
booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
|
113 |
+
year={2025},
|
114 |
+
url={https://arxiv.org/abs/2506.03735}
|
115 |
+
}
|
116 |
+
|
117 |
+
|
118 |
+
βΈ»
|
119 |
+
|
120 |
+
π¬ License & Contact
|
121 |
+
|
122 |
+
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
|
123 |
+
|
124 |
+
For commercial or research inquiries, please contact:
|
125 |
+
π§ Junling Wang β wangjun [at] ethz [dot] ch
|
126 |
+
|