Machine Text Detectors are Membership Inference Attacks
Abstract
Theoretical and empirical investigation shows strong transferability between membership inference attacks and machine-generated text detection, highlighting the need for cross-task collaboration and introducing MINT for unified evaluation.
Although membership inference attacks (MIAs) and machine-generated text detection target different goals, identifying training samples and synthetic texts, their methods often exploit similar signals based on a language model's probability distribution. Despite this shared methodological foundation, the two tasks have been independently studied, which may lead to conclusions that overlook stronger methods and valuable insights developed in the other task. In this work, we theoretically and empirically investigate the transferability, i.e., how well a method originally developed for one task performs on the other, between MIAs and machine text detection. For our theoretical contribution, we prove that the metric that achieves the asymptotically highest performance on both tasks is the same. We unify a large proportion of the existing literature in the context of this optimal metric and hypothesize that the accuracy with which a given method approximates this metric is directly correlated with its transferability. Our large-scale empirical experiments, including 7 state-of-the-art MIA methods and 5 state-of-the-art machine text detectors across 13 domains and 10 generators, demonstrate very strong rank correlation (rho > 0.6) in cross-task performance. We notably find that Binoculars, originally designed for machine text detection, achieves state-of-the-art performance on MIA benchmarks as well, demonstrating the practical impact of the transferability. Our findings highlight the need for greater cross-task awareness and collaboration between the two research communities. To facilitate cross-task developments and fair evaluations, we introduce MINT, a unified evaluation suite for MIAs and machine-generated text detection, with implementation of 15 recent methods from both tasks.
Community
We show that machine-generated text detectors can function as state-of-the-art membership inference attacks and prove that the optimal formulation for both tasks is the same
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Diversity Boosts AI-Generated Text Detection (2025)
- Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis (2025)
- (Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs (2025)
- Membership Inference Attacks on Tokenizers of Large Language Models (2025)
- Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection (2025)
- Black-box Detection of LLM-generated Text Using Generalized Jensen-Shannon Divergence (2025)
- How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper