Papers
arxiv:2502.02759

ClarAVy: A Tool for Scalable and Accurate Malware Family Labeling

Published on Feb 4
Authors:
,
,
,

Abstract

ClarAVy uses a Variational Bayesian approach to accurately and scalably label malware families by aggregating antivirus detections.

AI-generated summary

Determining the family to which a malicious file belongs is an essential component of cyberattack investigation, attribution, and remediation. Performing this task manually is time consuming and requires expert knowledge. Automated tools using that label malware using antivirus detections lack accuracy and/or scalability, making them insufficient for real-world applications. Three pervasive shortcomings in these tools are responsible: (1) incorrect parsing of antivirus detections, (2) errors during family alias resolution, and (3) an inappropriate antivirus aggregation strategy. To address each of these, we created our own malware family labeling tool called ClarAVy. ClarAVy utilizes a Variational Bayesian approach to aggregate detections from a collection of antivirus products into accurate family labels. Our tool scales to enormous malware datasets, and we evaluated it by labeling approx40 million malicious files. ClarAVy has 8 and 12 percentage points higher accuracy than the prior leading tool in labeling the MOTIF and MalPedia datasets, respectively.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.02759 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2502.02759 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2502.02759 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.