COMPARISONS BETWEEN ROBUST REGRESSION APPROACHES IN THE PRESENCE OF OUTLIERS AND HIGH LEVERAGE POINTS

  • Anwar Fitrianto Department of Statistics, IPB University
  • Sim Hui Xin Department of Mathematics, Universiti Putra Malaysia
Keywords: breakdown points, influential, leverage, outlier, robust regression

Abstract

The study aimed to compare a few robust approaches in linear regression in the presence of outlier and high leverage points. Ordinary least square (OLS) estimation of parameters is the most basic approach practiced widely in regression analysis. However, some fundamental assumptions must be fulfilled to provide good parameter estimates for the OLS estimation. The error term in the regression model must be identically and independently comes from a Normal distribution. The failure to fulfill the assumptions will result in a poor estimation of parameters. The violation of assumptions may occur due to the presence of unusual observations (which is known as outliers or high leverage points. Even in the case of only one single extreme value appearing in the set of data, the result of the OLS estimation will be affected. The parameter estimates may become bias and unreliable if the data contains outlier or high leverage point. In order to solve the consequences due to unusual observations, robust regression is suggested to help in reducing the effect of unusual observation to the result of estimation. There are four types of robust regression estimations practiced in this paper: M estimation, LTS estimation, S estimation, and MM estimation, respectively. Comparisons of the result among different types of robust estimator and the classical least square estimator have been carried out. M estimation works well when the data is only contaminated in response variable. But in the case of presence of high leverage point, M estimation cannot perform well.

Downloads

Download data is not yet available.

References

D. Alita, A. D. Putra, and D. Darwis, “Analysis of classic assumption test and multiple linear regression coefficient test for employee structural office recommendation,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, 2021.

A. F. Schmidt and C. Finan, “Linear regression and the normality assumption,” Journal of clinical epidemiology, vol. 98, pp. 146–151, 2018.

S. Chatterjee and A. S. Hadi, Regression analysis by example. John Wiley & Sons, 2015.

M. N. Williams, C. A. G. Grajales, and D. Kurkiewicz, “Assumptions of multiple regression: Correcting two misconceptions,” Practical Assessment, Research, and Evaluation, vol. 18, no. 1, p. 11, 2013.

K. Ayinde, A. F. Lukman, and O. Arowolo, “Robust regression diagnostics of influential observations in linear regression model,” Open Journal of Statistics, vol. 5, no. 04, p. 273, 2015.

X. Su and C. Tsai, “Outlier detection,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 261–268, 2011.

C. Yu and W. Yao, “Robust linear regression: A review and comparison,” Communications in Statistics-Simulation and Computation, vol. 46, no. 8, pp. 6261–6282, 2017.

D. B. Flora, C. LaBrish, and R. P. Chalmers, “Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis,” Frontiers in Psychology, vol. 3, p. 55, 2012.

Ö. G. Alma, “Comparison of robust regression methods in linear regression,” Int. J. Contemp. Math. Sciences, vol. 6, no. 9, pp. 409–421, 2011.

D. Huang, R. Cabral, and F. de la Torre, “Robust regression,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 363–375, 2015.

D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regression analysis. John Wiley & Sons, 2021.

J. R. da Silva Cabral de Moraes, G. de Souza Rolim, and L. E. de Oliveira Aparecido, “Software for the detection of outliers and influential points based on the HAT method,” Australian Journal of Crop Science, vol. 11, no. 4, pp. 459–463, 2017.

N. Sorokina, D. E. Booth, and J. H. Thornton Jr, “Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications.,” Journal of Data Science, vol. 11, 2013.

J. Kalina, “Three contributions to robust regression diagnostics,” Journal of Applied Mathematics, Statistics and Informatics, vol. 11, no. 2, pp. 69–78, 2015.

P. J. Rousseeuw, “Least Median of Squares Regression,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 871–880, Dec. 1984, doi: 10.1080/01621459.1984.10477105.

D. M. Khan, M. Ali, Z. Ahmad, S. Manzoor, and S. Hussain, “A New Efficient Redescending M-Estimator for Robust Fitting of Linear Regression Models in the Presence of Outliers,” Mathematical Problems in Engineering, vol. 2021, 2021.

P. Rousseeuw and V. Yohai, “Robust regression by means of S-estimators,” in Robust and nonlinear time series analysis, Springer, 1984, pp. 256–272.

J. Wellmann, “Robustness of an S‐Estimator in the One‐Way Random Effects Model,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, vol. 42, no. 2, pp. 215–221, 2000.

R. A. Maronna, “martin, RD, Yohai, VJ (2006). Robust statistics: Theory and Methods.” Wiley, Chichester, West Sussex, UK.

R. A. Maronna and V. J. Yohai, “Correcting MM estimates for ‘fat’ data sets,” Computational Statistics & Data Analysis, vol. 54, no. 12, pp. 3168–3173, 2010.

Published
2022-03-21
How to Cite
[1]
A. Fitrianto and S. Xin, “COMPARISONS BETWEEN ROBUST REGRESSION APPROACHES IN THE PRESENCE OF OUTLIERS AND HIGH LEVERAGE POINTS”, BAREKENG: J. Math. & App., vol. 16, no. 1, pp. 243-252, Mar. 2022.