junling24 commited on
Commit
7cfc106
Β·
verified Β·
1 Parent(s): 930338a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -3
README.md CHANGED
@@ -1,3 +1,126 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ“ Math2Visual: Visual Language Generation Model
2
+
3
+ This is the official model for generating **structured visual language** representations from math word problems, as proposed in:
4
+
5
+ πŸ“„ **[ACL 2025 Paper β€” Math2Visual](https://arxiv.org/abs/2506.03735)**
6
+ πŸŽ₯ **[Project Video](https://youtu.be/jdPYVoHEPtk)**
7
+
8
+ ---
9
+
10
+ ## ✨ Model Summary
11
+
12
+ This model takes a math word problem (MWP) and its equation (formula) as input and outputs a **visual language** string which is used for generating pedagogically meaningful visuals. The output follows a fixed structure based on teacher-informed design to describe key mathematical relationships between entities, containers, and operations.
13
+
14
+ It is built by fine-tuning `meta-llama/Llama-3.1-8B` with LoRA using [PEFT](https://github.com/huggingface/peft), optimized with 4-bit quantization (BitsAndBytes). The code for generating visuals using visual language can be found in our **[github repository](https://github.com/eth-lre/math2visual/tree/main)**
15
+
16
+
17
+ ---
18
+
19
+ ## 🧠 Example Use
20
+
21
+ ### πŸ”§ Install dependencies
22
+
23
+ ```bash
24
+ pip install torch==2.5.1+cu121 torchvision==0.20.1+cu121 torchaudio==2.5.1+cu121 \
25
+ bitsandbytes==0.45.0 inflect==7.3.1 lxml==5.3.0 ipython==8.25.0 python-dotenv==1.0.1 \
26
+ git+https://github.com/huggingface/transformers.git@5fa35344755d8d9c29610b57d175efd03776ae9e \
27
+ git+https://github.com/huggingface/peft.git@aa3f41f7529ed078e9225b2fc1edbb8c71f58f99
28
+
29
+ πŸ’‘ Use -f https://download.pytorch.org/whl/torch_stable.html for CUDA wheels if needed.
30
+
31
+ βΈ»
32
+
33
+ πŸš€ Run Inference
34
+
35
+ import torch
36
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
37
+ from peft import PeftModel
38
+
39
+ # Load model
40
+ bnb_config = BitsAndBytesConfig(
41
+ load_in_4bit=True,
42
+ bnb_4bit_use_double_quant=True,
43
+ bnb_4bit_quant_type="nf4",
44
+ bnb_4bit_compute_dtype=torch.bfloat16
45
+ )
46
+
47
+ base_model_id = "meta-llama/Llama-3.1-8B"
48
+ adapter_dir = "junling24/Math2Visual-Visual_Language_Generation"
49
+
50
+ base = AutoModelForCausalLM.from_pretrained(
51
+ base_model_id,
52
+ quantization_config=bnb_config,
53
+ device_map="auto",
54
+ trust_remote_code=True
55
+ )
56
+ model = PeftModel.from_pretrained(base, adapter_dir)
57
+ model.eval()
58
+ model.config.use_cache = True
59
+
60
+ tokenizer = AutoTokenizer.from_pretrained(
61
+ base_model_id,
62
+ padding_side="left",
63
+ add_eos_token=True,
64
+ add_bos_token=True,
65
+ trust_remote_code=True
66
+ )
67
+ tokenizer.pad_token = tokenizer.eos_token
68
+ device = "cuda" if torch.cuda.is_available() else "cpu"
69
+ model.to(device)
70
+
71
+ # Prompt
72
+ def create_prompt(mwp, formula=None):
73
+ return (
74
+ '''You are an expert at converting math story problem into a structured 'visual language'...'''
75
+ f"Question: {mwp}\n"
76
+ f"Formula: {formula}\n"
77
+ "Answer in visual language:"
78
+ )
79
+
80
+ mwp = "Janet has nine oranges, and Sharon has seven oranges. How many oranges do Janet and Sharon have together?"
81
+ formula = "9 + 7 = 16"
82
+ prompt = create_prompt(mwp, formula)
83
+
84
+ inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048, padding="max_length").to(device)
85
+
86
+ with torch.no_grad():
87
+ outputs = model.generate(
88
+ **inputs,
89
+ max_new_tokens=2048,
90
+ do_sample=True,
91
+ temperature=0.7,
92
+ repetition_penalty=1.15
93
+ )
94
+
95
+ visual_language = tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
96
+ print("Generated Visual Language:\n", visual_language)
97
+
98
+
99
+ βΈ»
100
+
101
+ πŸ—‚οΈ Related Resources
102
+ β€’ πŸ“˜ Math2Visual Dataset on Hugging Face: **[link](https://huggingface.co/datasets/junling24/Math2Visual-Generating_Pedagogically_Meaningful_Visuals_for_Math_Word_Problems)**
103
+ β€’ πŸ’» Codebase on GitHub: **[link](https://github.com/eth-lre/math2visual/tree/main).
104
+
105
+ βΈ»
106
+
107
+ πŸ“„ Citation
108
+
109
+ @inproceedings{wang2025math2visual,
110
+ title={Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models},
111
+ author={Wang, Junling and Rutkiewicz, Anna and Wang, April Yi and Sachan, Mrinmaya},
112
+ booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
113
+ year={2025},
114
+ url={https://arxiv.org/abs/2506.03735}
115
+ }
116
+
117
+
118
+ βΈ»
119
+
120
+ πŸ“¬ License & Contact
121
+
122
+ This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
123
+
124
+ For commercial or research inquiries, please contact:
125
+ πŸ“§ Junling Wang β€” wangjun [at] ethz [dot] ch
126
+