RIDGE LEAST ABSOLUTE DEVIATION PERFORMANCE IN ADDRESSING MULTICOLLINEARITY AND DIFFERENT LEVELS OF OUTLIER SIMULTANEOUSLY DIFFERENT LEVELS OUTLIERS AND

. If there is multicollinearity and outliers in the data, the inference about parameter estimation in the LS method will deviate due to the inefficiency of this method in estimating. To overcome these two problems simultaneously can be done using robust regression, one of which is the ridge least absolute deviation method. This study aims to evaluate the performance of the ridge least absolute deviation method in surmounting multicollinearity in diverse sample sizes and percentage of outliers using simulation data. The Monte Carlo study was designed in a multiple regression model with multicollinearity (ρ=0.99) between variables 𝑥 1 and 𝑥 2 and outliers of 10%, 20%, and 30% on response variables with different sample sizes (n = 25, 50,75,100,200; 𝛽 0 =0, and β =1 otherwise ). The existence of multicollinearity in the data is done by calculating the correlation value between the independent variables and the VIF value. Outlier detection is done by using boxplot. Parameter estimation was carried out using the RLAD and LS methods. Furthermore, a comparison of the MSE values of the two methods is carried out to see which method is better at overcoming multicollinearity and outliers. The results showed that RLAD had a lower MSE than LS. This means that RLAD is more precise in estimating the regression coefficients for each sample size and the various outlier levels studied.


INTRODUCTION
As is known in the multiple regression linear model, the presence of multicollinearity and outliers in the data can affect the conclusion if the parameter estimation is carried out using the OLS method. The deviation of the conclusions obtained is caused by the inefficiency of this method if the assumptions of multicollinearity [1][2][3][4]. Using robust regression to cope with the correlation between independent variables can be done. There are several types of robust methods, one of which is robust ridge regression. The advantage of using the robust ridge regression method is that the standard error can be reduced and a more accurate estimation of the regression coefficient can be obtained [5][6][7][8][9][10].
In addition, deviations in conclusions can also be found when there is data that deviates from the average which is called an outlier. Outliers can lead to non-fulfillment of the assumption of normality and homogeneous error [11]. We need a method that can solve the problems of outliers and multicollinearity simultaneously.
Regression Ridge Least Absolute Deviation (RLAD) is an alternative method that we chose to handle multicollinearity and outliers at the same time, besides other robust methods that are widely available. This method is a combination of the robust ridge regression method and the least absolute deviation method. The RLAD estimator that results will be stable and resistant to outliers [12]. However, there has been no comprehensive research using this method to seek the outcome of this method in overcoming various levels of outliers at various sample sizes. Therefore, in this study, ridge least absolute deviation performance in the multiple regression model with data containing multicollinearity and various levels of outliers at various sample sizes was analyzed and compared with the least squares method using simulation data.

RESEARCH METHODS
Least Absolute Deviation (LAD) is a robust regression parameter estimation method that is resistant to the presence of outliers by minimizing the total absolute value of the residual. The least absolute deviation method can be defined as: In the formula, it can be seen that LAD will minimize the absolute value of the residual. This is in contrast to the least squares method, which minimizes the sum of the squares of the residuals. In this way, the effect of outliers will be minimized in the LAD method and will produce more accurate regression coefficient estimator [6].
Because the LAD method does not have an analytical solution to obtain parameter estimates, an iterative approach is needed. The weighted least squares procedure can be used for this. The iterations were performed to obtain a convergent value. The Least Absolute Deviation (LAD) parameter estimator can be solved using the following formula: where W is a diagonal matrix with diagonal elements with One of the methods commonly used to solve multicollinearity problems by limiting coefficient estimates is ridge regression, namely by modifying the LS parameter estimator. In this way, ridge regression is able to reduce the variance of the estimator. However, it generates bias. The relatively small constant bias resulting from the ridge method is added to the main diagonal of the matrix ′ obtained by the LS method to form a new matrix (X ' X+ I) [13]. The ridge regression model can be written as: where is the ridge regression parameter to be estimated.
The ridge regression estimator requires that ′ = be satisfied to obtain the minimum sum of squares. This can be done using the Lagrange multiplier method to obtain: If Eq. (4) is derived with respect to and then equalized to zero, then the following equation will be obtained: with I = identity matrix; 0 ≤ ≤ 1; ̂ = ridge parameter vector. Adding bias to the diagonal matrix ′ in order to get an unbiased estimator coefficient and independent variables X and dependent variable Y should be transformed into a standard variable (standardization) [14]. The combination of LAD and ridge regression will produce the ridge least absolute deviation (RLAD) method. With the ability of LAD to overcome multicollinearity and ridge regression that is able to deal with outliers, the RLAD method will be able to surmount multicollinearity and outliers in the data simultaneously [12]. The parameter estimator of RLAD can be written as: where ̂= LAD regression estimator with 0 ≤ * ≤ 1.
There are several ways to select the value of *. One of the formulas is to use the method introduced by [5,7,12]  Mean Square Error (MSE) is one of the most popular and easy to use error measurements. The MSE value is used to measure the accuracy of the estimated value of the regression model, which is expressed in the mean square of the error. Generally, the smaller the MSE, the more accurate the forecast value of a model will be. In addition, in this case, the best method is defined as the method that can fix multicollinearity and outlier problems in unison. The Mean Square Error (MSE) formula to determine the best parameter estimation results of ̂ is: With ̂ is estimated regression coefficient; is regression coefficient to be estimate and is repetition.

RESULTS AND DISCUSSION
The results of the analysis to ensure the existence of multicollinearity in the data were carried out by calculating the correlation between the independent variables based on VIF values. The results are presented in Table 1 As represented in Table 1, the VIF values for the variables 1 and 2 are greater than 10. This indicates the presence of multicollinearity. Afterwards, we checked if there were outliers in the data by using a Box plot. We present the box plot for n=20 with 10% outliers as displayed in Figure 1 below.  Figure 1 shows that outliers were detected using the box plot, which was indicated by data that was far from the average value. The same method was used to detect the presence of outliers for n=20, 40, 60, 100, 200, and outlier levels of 10%, 20%, and 30%. From Table 1 and Figure 1 above, it can be seen that there is multicollinearity indicated by the VIF value and outliers indicated by the data, which is far from the average value in the boxplot. This indicates that a robust method is needed to address the problems. To begin with, we have to reduce the correlation between variables and eliminate outliers by using RLAD. After being analyzed by RLAD, the VIF value of the data was checked again to ensure that there was no longer any correlation between variables. The VIF value generated using RLAD is presented in Table 2.
It can be seen in Table 2 that the value of the correlation between the independent variables is reduced as indicated by the VIF value between the variables. The VIF value of all independent variables becomes less than 10, which indicates that there is no longer a correlation between the independent variables.  RLAD for n=20, 40, 60, 60, 100, 200 with 10%, 20%, and 30%  In addition, the outliers were eliminated automatically. We proceeded to compute the MSE of RLAD and LS. The results are provided in Table 3 and Figure 2.   Table 3 and Figure 2 show that RLAD has a smaller MSE than LS for n=20, 40, 60, 100, and 200 for different numbers of outliers in the data. In addition, the sample size seems to also affect the MSE of both methods. Likewise, with the number of outliers in the data. The MSE value decreased with increasing sample size for both methods. However, the MSE for both methods increases as the number of outliers increases. We present Figure 3 to show MSE RLAD separately from MSE LS to get clearer results. From this figure, it is clear that the MSE of RLAD is higher at small sample sizes. The MSE value decreased with an increasing number of samples. In addition, the MSE value also seems to be influenced by the number of outliers. At a small sample size, the MSE value is high. However, by increasing the sample size, this can be overcome. In addition, Figure 3 shows that the larger the sample size, the outliers will not have much effect on the MSE value.
The results of this study indicate that the RLAD method can overcome multicollinearity and outliers simultaneously compared to the LS method. In addition, when viewed from the MSE value, this study also produces parameter estimates using the RLAD method that are more precise than the LS method. Likewise, if based on the sample size used, it is also found that for small and large sample sizes, the RLAD method can reduce the magnitude of the estimation error value compared to LS. The results of this study are in line with previous results that the robust method in general and the RLAD method in particular can overcome multicollinearity and outliers simultaneously [8][9][10][11][12].

CONCLUSIONS
Based on the results of the study, the sample size has an effect on the MSE value of RLAD and LS. The larger the sample size used in the data, the smaller the MSE value for both RLAD and LS, even though the presence of outliers is increasing. Overall, it can be concluded that RLAD has a better performance than LS in overcoming multicollinearity and various outlier levels because it has a smaller MSE value at various sample sizes and levels of outliers studied.