The analysis of monitoring data is extremely valuable for critical computer systems. It allows to gain insights into the failure behavior of a given system under real workload conditions, which is crucial to assure service continuity and downtime reduction.This paper proposes an experimental evaluation of different direct monitoring techniques, namely event logs, assertions, and source code instrumentation, that are widely used in the context of critical industrial systems. We inject 12,733 software faults in a real-world air traffic control (ATC) middleware system with the aim of analyzing the ability of mentioned techniques to produce information in case of failures. Experimental results indicate that each technique is able to cover a limited number of failure manifestations. Moreover, we observe that the quality of collected data to support failure diagnosis tasks strongly varies across the techniques considered in this study.

Assessing Direct Monitoring Techniques to Analyze Failures of Critical Industrial Systems

PECCHIA, ANTONIO
2014-01-01

Abstract

The analysis of monitoring data is extremely valuable for critical computer systems. It allows to gain insights into the failure behavior of a given system under real workload conditions, which is crucial to assure service continuity and downtime reduction.This paper proposes an experimental evaluation of different direct monitoring techniques, namely event logs, assertions, and source code instrumentation, that are widely used in the context of critical industrial systems. We inject 12,733 software faults in a real-world air traffic control (ATC) middleware system with the aim of analyzing the ability of mentioned techniques to produce information in case of failures. Experimental results indicate that each technique is able to cover a limited number of failure manifestations. Moreover, we observe that the quality of collected data to support failure diagnosis tasks strongly varies across the techniques considered in this study.
2014
9781479960323
Monitoring system; software fault tolerance
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/43981
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 13
social impact