Software faults are recognized to be among the main responsible for system failures in many application domains. Event logs play a key role to support the analysis of failures occurring under real workload conditions. Nevertheless, field experience suggests that event logs may be inaccurate at reporting software failures or they fail to provide accurate support for understanding their causes.This paper analyzes the factors that determine accurate detection of software failures through event logs. The study is based on a data set of 17,387 experiments where failures have been induced by means of software fault injection into three systems. Analysis reveals that the reporting ability of logs collected during the experiments, is not influenced by the type of fault that is activated at runtime. More importantly, analysis demonstrates that, despite the considered systems adopt very similar detection mechanisms, the ability of logs at reporting a given type of failure changes significantly across the systems. A closer inspection of collected logs reveals that characteristics, such as system architecture, placement of the logging instructions and specific supports provided by the execution environment, significantly increase accuracy of logs at runtime.
Detection of Software Failures through Event Logs: an Experimental Study
PECCHIA, ANTONIO;
2012-01-01
Abstract
Software faults are recognized to be among the main responsible for system failures in many application domains. Event logs play a key role to support the analysis of failures occurring under real workload conditions. Nevertheless, field experience suggests that event logs may be inaccurate at reporting software failures or they fail to provide accurate support for understanding their causes.This paper analyzes the factors that determine accurate detection of software failures through event logs. The study is based on a data set of 17,387 experiments where failures have been induced by means of software fault injection into three systems. Analysis reveals that the reporting ability of logs collected during the experiments, is not influenced by the type of fault that is activated at runtime. More importantly, analysis demonstrates that, despite the considered systems adopt very similar detection mechanisms, the ability of logs at reporting a given type of failure changes significantly across the systems. A closer inspection of collected logs reveals that characteristics, such as system architecture, placement of the logging instructions and specific supports provided by the execution environment, significantly increase accuracy of logs at runtime.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.