Characterizing Direct Monitoring Techniques in Software Systems

Cinque, Marcello; Cotroneo, Domenico; Corte, Raffaele Della; Pecchia, Antonio

doi:10.1109/TR.2016.2570564

Monitoring is a consolidated practice to characterize the dependability behavior of a software system. A variety of techniques, such as event logging and operating system probes, are currently used to generate monitoring data for troubleshooting and failure analysis. In spite of the importance of monitoring, whose role can be essential in critical software systems, there is a lack of studies addressing the assessment and the comparison of the techniques aiming to monitor the occurrence of failures during operations. This paper proposes a method to characterize the monitoring techniques implemented in a software system. The method is based on a fault injection approach and allows measuring 1) precision and recall of a monitoring technique and 2) the dissimilarity of the data it generates upon failures. The method has been used in two critical software systems implementing event logging, assertion checking, and source code instrumentation techniques. We analyzed a total of 3 844 failures. With respect to our data, we observed that the effectiveness of a technique is strongly affected by the system and type of failure, and that the combination of different techniques is potentially beneficial to increase the overall failure reporting ability. More important, our analysis revealed a number of practical implications to be taken into account when developing a monitoring technique.