COMPARISONS BETWEEN ROBUST REGRESSION APPROACHES IN THE PRESENCE OF OUTLIERS AND HIGH LEVERAGE POINTS
Abstract
The study aimed to compare a few robust approaches in linear regression in the presence of outlier and high leverage points. Ordinary least square (OLS) estimation of parameters is the most basic approach practiced widely in regression analysis. However, some fundamental assumptions must be fulfilled to provide good parameter estimates for the OLS estimation. The error term in the regression model must be identically and independently comes from a Normal distribution. The failure to fulfill the assumptions will result in a poor estimation of parameters. The violation of assumptions may occur due to the presence of unusual observations (which is known as outliers or high leverage points. Even in the case of only one single extreme value appearing in the set of data, the result of the OLS estimation will be affected. The parameter estimates may become bias and unreliable if the data contains outlier or high leverage point. In order to solve the consequences due to unusual observations, robust regression is suggested to help in reducing the effect of unusual observation to the result of estimation. There are four types of robust regression estimations practiced in this paper: M estimation, LTS estimation, S estimation, and MM estimation, respectively. Comparisons of the result among different types of robust estimator and the classical least square estimator have been carried out. M estimation works well when the data is only contaminated in response variable. But in the case of presence of high leverage point, M estimation cannot perform well.
Downloads
References
D. Alita, A. D. Putra, and D. Darwis, “Analysis of classic assumption test and multiple linear regression coefficient test for employee structural office recommendation,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, 2021.
A. F. Schmidt and C. Finan, “Linear regression and the normality assumption,” Journal of clinical epidemiology, vol. 98, pp. 146–151, 2018.
S. Chatterjee and A. S. Hadi, Regression analysis by example. John Wiley & Sons, 2015.
M. N. Williams, C. A. G. Grajales, and D. Kurkiewicz, “Assumptions of multiple regression: Correcting two misconceptions,” Practical Assessment, Research, and Evaluation, vol. 18, no. 1, p. 11, 2013.
K. Ayinde, A. F. Lukman, and O. Arowolo, “Robust regression diagnostics of influential observations in linear regression model,” Open Journal of Statistics, vol. 5, no. 04, p. 273, 2015.
X. Su and C. Tsai, “Outlier detection,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 261–268, 2011.
C. Yu and W. Yao, “Robust linear regression: A review and comparison,” Communications in Statistics-Simulation and Computation, vol. 46, no. 8, pp. 6261–6282, 2017.
D. B. Flora, C. LaBrish, and R. P. Chalmers, “Old and new ideas for data screening and assumption testing for exploratory and confirmatory factor analysis,” Frontiers in Psychology, vol. 3, p. 55, 2012.
Ö. G. Alma, “Comparison of robust regression methods in linear regression,” Int. J. Contemp. Math. Sciences, vol. 6, no. 9, pp. 409–421, 2011.
D. Huang, R. Cabral, and F. de la Torre, “Robust regression,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 363–375, 2015.
D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to linear regression analysis. John Wiley & Sons, 2021.
J. R. da Silva Cabral de Moraes, G. de Souza Rolim, and L. E. de Oliveira Aparecido, “Software for the detection of outliers and influential points based on the HAT method,” Australian Journal of Crop Science, vol. 11, no. 4, pp. 459–463, 2017.
N. Sorokina, D. E. Booth, and J. H. Thornton Jr, “Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications.,” Journal of Data Science, vol. 11, 2013.
J. Kalina, “Three contributions to robust regression diagnostics,” Journal of Applied Mathematics, Statistics and Informatics, vol. 11, no. 2, pp. 69–78, 2015.
P. J. Rousseeuw, “Least Median of Squares Regression,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 871–880, Dec. 1984, doi: 10.1080/01621459.1984.10477105.
D. M. Khan, M. Ali, Z. Ahmad, S. Manzoor, and S. Hussain, “A New Efficient Redescending M-Estimator for Robust Fitting of Linear Regression Models in the Presence of Outliers,” Mathematical Problems in Engineering, vol. 2021, 2021.
P. Rousseeuw and V. Yohai, “Robust regression by means of S-estimators,” in Robust and nonlinear time series analysis, Springer, 1984, pp. 256–272.
J. Wellmann, “Robustness of an S‐Estimator in the One‐Way Random Effects Model,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, vol. 42, no. 2, pp. 215–221, 2000.
R. A. Maronna, “martin, RD, Yohai, VJ (2006). Robust statistics: Theory and Methods.” Wiley, Chichester, West Sussex, UK.
R. A. Maronna and V. J. Yohai, “Correcting MM estimates for ‘fat’ data sets,” Computational Statistics & Data Analysis, vol. 54, no. 12, pp. 3168–3173, 2010.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.