Motivated by the analysis of rating data concerning perceived health status, a crucial variable in biomedical, economic and life insurance models, the paper deals with diagnostic procedures for identifying anomalous and/or influential observations in ordinal response models with challenging data structures. Deviations due to some respondents’ atypical behavior, outlying covariates and gross errors may affect the reliability of likelihood based inference, especially when non robust link functions are adopted. The present paper investigates and exploits the properties of the generalized residuals. They appear in the estimating equations of the regression coefficients and hold the remarkable characteristic of interacting with the covariates in the same fashion as the linear regression residuals. Identification of statistical units incoherent with the model can be achieved by the analysis of the residuals produced by maximum likelihood or robust M-estimation, while the inspection of the weights generated by M-estimation allows to identify influential data. Simple guidelines are proposed to this end, which disclose information on the data structure. The purpose is twofold: recognizing statistical units that deserve specific attention for their peculiar features, and being aware of the sensitivity of the fitted model to small changes in the sample. In the analysis of the self-perceived health status, extreme design points associated with incoherent responses produce highly influential observations. The diagnostic procedures identify the outliers and assess their influence.

Generalized residuals and outlier detection for ordinal data with challenging data structures

Anna Clara Monti
2023-01-01

Abstract

Motivated by the analysis of rating data concerning perceived health status, a crucial variable in biomedical, economic and life insurance models, the paper deals with diagnostic procedures for identifying anomalous and/or influential observations in ordinal response models with challenging data structures. Deviations due to some respondents’ atypical behavior, outlying covariates and gross errors may affect the reliability of likelihood based inference, especially when non robust link functions are adopted. The present paper investigates and exploits the properties of the generalized residuals. They appear in the estimating equations of the regression coefficients and hold the remarkable characteristic of interacting with the covariates in the same fashion as the linear regression residuals. Identification of statistical units incoherent with the model can be achieved by the analysis of the residuals produced by maximum likelihood or robust M-estimation, while the inspection of the weights generated by M-estimation allows to identify influential data. Simple guidelines are proposed to this end, which disclose information on the data structure. The purpose is twofold: recognizing statistical units that deserve specific attention for their peculiar features, and being aware of the sensitivity of the fitted model to small changes in the sample. In the analysis of the self-perceived health status, extreme design points associated with incoherent responses produce highly influential observations. The diagnostic procedures identify the outliers and assess their influence.
2023
Anomalous data, Diagnostics, Generalized residuals, Influential data, Robust estimation, Ordinal response models
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12070/57680
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact