GraphLocator: Graph-guided Causal Reasoning for Issue Localization
Abstract
The issue localization task aims to identify the locations in a software repository that requires modification given a natural language issue description. This task is fundamental yet challenging in automated software engineering due to the semantic gap between issue description and source code implementation. This gap manifests as two mismatches:(1) symptom-to-cause mismatches, where descriptions do not explicitly reveal underlying root causes; (2) one-to-many mismatches, where a single issue corresponds to multiple interdependent code entities. To address these two mismatches, we propose GraphLocator, an approach that mitigates symptom-to-cause mismatches through causal structure discovering and resolves one-to-many mismatches via dynamic issue disentangling. The key artifact is the causal issue graph (CIG), in which vertices represent discovered sub-issues along with their associated code entities, and edges encode the causal dependencies between them. The workflow of GraphLocator consists of two phases: symptom vertices locating and dynamic CIG discovering; it first identifies symptom locations on the repository graph, then dynamically expands the CIG by iteratively reasoning over neighboring vertices. Experiments on three real-world datasets demonstrates the effectiveness of GraphLocator: (1) Compared with baselines, GraphLocator achieves more accurate localization with average improvements of +19.49% in function-level recall and +11.89% in precision. (2) GraphLocator outperforms baselines on both symptom-to-cause and one-to-many mismatch scenarios, achieving recall improvement of +16.44% and +19.18%, precision improvement of +7.78% and +13.23%, respectively. (3) The CIG generated by GraphLocator yields the highest relative improvement, resulting in a 28.74% increase in performance on downstream resolving task.
Community
Issues describe symptoms—the real buggy code is often hidden behind multi-hop dependencies. GraphLocator moves beyond relevance retrieval by explicitly modelling causal structure: decomposing sub-problems, capturing causal dependencies, and performing constrained causal reasoning over code dependency graphs. Across multiple Python and Java datasets, GraphLocator significantly improves function-level recall and precision, and the inferred causal structures further boost downstream issue resolution by +28.74%.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- InfCode-C++: Intent-Guided Semantic Retrieval and AST-Structured Search for C++ Issue Resolution (2025)
- Completion by Comprehension: Guiding Code Generation with Multi-Granularity Understanding (2025)
- SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization (2025)
- Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications (2025)
- GraphMind: Theorem Selection and Conclusion Generation Framework with Dynamic GNN for LLM Reasoning (2025)
- Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures (2025)
- Co-Evolution of Types and Dependencies: Towards Repository-Level Type Inference for Python Code (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper