junling24
/

Math2Visual-Visual_Language_Generation

Safetensors

Model card Files Files and versions

xet

Community

junling24 commited on Jul 18

Commit

7cfc106

verified ·

1 Parent(s): 930338a

Update README.md

Browse files

Files changed (1) hide show

README.md +126 -3

README.md CHANGED Viewed

@@ -1,3 +1,126 @@
----
-license: cc-by-nc-4.0
----

+# 📐 Math2Visual: Visual Language Generation Model
+This is the official model for generating **structured visual language** representations from math word problems, as proposed in:
+📄 **[ACL 2025 Paper — Math2Visual](https://arxiv.org/abs/2506.03735)**
+🎥 **[Project Video](https://youtu.be/jdPYVoHEPtk)**
+---
+## ✨ Model Summary
+This model takes a math word problem (MWP) and its equation (formula) as input and outputs a **visual language** string which is used for generating pedagogically meaningful visuals. The output follows a fixed structure based on teacher-informed design to describe key mathematical relationships between entities, containers, and operations.
+It is built by fine-tuning `meta-llama/Llama-3.1-8B` with LoRA using [PEFT](https://github.com/huggingface/peft), optimized with 4-bit quantization (BitsAndBytes). The code for generating visuals using visual language can be found in our **[github repository](https://github.com/eth-lre/math2visual/tree/main)**
+---
+## 🧠 Example Use
+### 🔧 Install dependencies
+```bash
+pip install torch==2.5.1+cu121 torchvision==0.20.1+cu121 torchaudio==2.5.1+cu121 \
+  bitsandbytes==0.45.0 inflect==7.3.1 lxml==5.3.0 ipython==8.25.0 python-dotenv==1.0.1 \
+  git+https://github.com/huggingface/transformers.git@5fa35344755d8d9c29610b57d175efd03776ae9e \
+  git+https://github.com/huggingface/peft.git@aa3f41f7529ed078e9225b2fc1edbb8c71f58f99
+💡 Use -f https://download.pytorch.org/whl/torch_stable.html for CUDA wheels if needed.
+⸻
+🚀 Run Inference
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+from peft import PeftModel
+# Load model
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+base_model_id = "meta-llama/Llama-3.1-8B"
+adapter_dir = "junling24/Math2Visual-Visual_Language_Generation"
+base = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    quantization_config=bnb_config,
+    device_map="auto",
+    trust_remote_code=True
+)
+model = PeftModel.from_pretrained(base, adapter_dir)
+model.eval()
+model.config.use_cache = True
+tokenizer = AutoTokenizer.from_pretrained(
+    base_model_id,
+    padding_side="left",
+    add_eos_token=True,
+    add_bos_token=True,
+    trust_remote_code=True
+)
+tokenizer.pad_token = tokenizer.eos_token
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+# Prompt
+def create_prompt(mwp, formula=None):
+    return (
+        '''You are an expert at converting math story problem into a structured 'visual language'...'''
+        f"Question: {mwp}\n"
+        f"Formula: {formula}\n"
+        "Answer in visual language:"
+    )
+mwp = "Janet has nine oranges, and Sharon has seven oranges. How many oranges do Janet and Sharon have together?"
+formula = "9 + 7 = 16"
+prompt = create_prompt(mwp, formula)
+inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048, padding="max_length").to(device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=2048,
+        do_sample=True,
+        temperature=0.7,
+        repetition_penalty=1.15
+    )
+visual_language = tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
+print("Generated Visual Language:\n", visual_language)
+⸻
+🗂️ Related Resources
+	•	📘 Math2Visual Dataset on Hugging Face: **[link](https://huggingface.co/datasets/junling24/Math2Visual-Generating_Pedagogically_Meaningful_Visuals_for_Math_Word_Problems)**
+	•	💻 Codebase on GitHub: **[link](https://github.com/eth-lre/math2visual/tree/main).
+⸻
+📄 Citation
+@inproceedings{wang2025math2visual,
+  title={Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models},
+  author={Wang, Junling and Rutkiewicz, Anna and Wang, April Yi and Sachan, Mrinmaya},
+  booktitle={Findings of the Association for Computational Linguistics: ACL 2025},
+  year={2025},
+  url={https://arxiv.org/abs/2506.03735}
+}
+⸻
+📬 License & Contact
+This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
+For commercial or research inquiries, please contact:
+📧 Junling Wang — wangjun [at] ethz [dot] ch