arxiv:2505.19949

Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

Published on May 26

· Submitted by

karrykkk on May 27

Upvote

Authors:

Siqi Kou ,

Zihao Zeng ,

Abstract

Influence functions are used to attribute LLMs' reasoning in math and coding to individual training elements, revealing cross-domain effects and enabling a reweighting strategy that improves model accuracy.

AI-generated summary

Large language models (LLMs) have demonstrated remarkable reasoning capabilities in math and coding, often bolstered by post-training on the chain-of-thoughts (CoTs) generated by stronger models. However, existing strategies for curating such training data predominantly rely on heuristics, limiting generalizability and failing to capture subtleties underlying in data. To address these limitations, we leverage influence functions to systematically attribute LLMs' reasoning ability on math and coding to individual training examples, sequences, and tokens, enabling deeper insights into effective data characteristics. Our Influence-based Reasoning Attribution (Infra) uncovers nontrivial cross-domain effects across math and coding tasks: high-difficulty math examples improve both math and code reasoning, while low-difficulty code tasks most effectively benefit code reasoning. Based on these findings, we introduce a simple yet effective dataset reweighting strategy by flipping task difficulty, which doubles AIME24 accuracy from 10\% to 20\% and boosts LiveCodeBench accuracy from 33.8\% to 35.3\% for Qwen2.5-7B-Instruct. Moreover, our fine-grained attribution reveals that the sequence-level exploratory behaviors enhance reasoning performance in both math and code, and the token-level influence patterns are distinct for math and code reasoning: the former prefers natural language logic connectors and the latter emphasizes structural syntax.

View arXiv page View PDF Add to collection

Community

karrykkk

Paper author Paper submitter May 27

In this paper, we propose a fine-grained influence function framework to trace how training data on SFT phase shapes LLM reasoning in math and code tasks.