An empirical study of metric-based methods to detect obfuscated code

IRIS

Protecting data and applications from malware and other forms of malicious code has assumed a great relevance in the current era of pervasive web-based applications. Attackers often use code obfuscation to hide harmful programs from automatic detection. Several researchers have proposed methods to classify an unknown program as malicious or benign; however, little work has been done to identify obfuscated code. A promising approach to detect obfuscated code consists of using a set of metrics, collected by static analysis, to classify a program. In this paper we present an empirical evaluation of three text-based metrics to identify obfuscated code. Our experiment shows that the effectiveness of these metrics depends on the obfuscators: there are cases in which the metrics allow the proliferation of false positives (i.e., misclassification of clear code as obfuscated code), which is bothering but not dangerous, and cases where false negatives (i.e. misclassification of obfuscated as clear code) proliferate, which is definitely more dangerous. Based on our experiment, we propose a combination of these three metrics and show how this combination outperforms the individual metrics.

An empirical study of metric-based methods to detect obfuscated code

Visaggio C;Pagin G A;Canfora G

2013-01-01

Abstract

Protecting data and applications from malware and other forms of malicious code has assumed a great relevance in the current era of pervasive web-based applications. Attackers often use code obfuscation to hide harmful programs from automatic detection. Several researchers have proposed methods to classify an unknown program as malicious or benign; however, little work has been done to identify obfuscated code. A promising approach to detect obfuscated code consists of using a set of metrics, collected by static analysis, to classify a program. In this paper we present an empirical evaluation of three text-based metrics to identify obfuscated code. Our experiment shows that the effectiveness of these metrics depends on the obfuscators: there are cases in which the metrics allow the proliferation of false positives (i.e., misclassification of clear code as obfuscated code), which is bothering but not dangerous, and cases where false negatives (i.e. misclassification of obfuscated as clear code) proliferate, which is definitely more dangerous. Based on our experiment, we propose a combination of these three metrics and show how this combination outperforms the individual metrics.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2013

Appare nelle tipologie:

1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
5.pdf non disponibili Licenza: Non specificato Dimensione 427.68 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	427.68 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/848

Citazioni

ND

11

ND

social impact