CLUSTER FAST DOUBLE BOOTSTRAP APPROACH WITH RANDOM EFFECT SPATIAL MODELING

Article History: Panel data is a combination of cross-sectional and time series data. Spatial panel analysis is an analysis to obtain information based on observations affected by the space or location effects. The effect of location effects on spatial analysis is presented in the form of weighting. The use of panel data in spatial regression provides a number of advantages. However, the spatial dependence test and parameter estimators generated in the spatial regression of the data panel will be inaccurate when applied to areas with a small number of spatial units. One method to overcome the problem of small spatial unit sizes is the bootstrap method. This study used the fast double bootstrap (FDB) method by modeling the poverty rate in the Flores Islands. The data used in the study was sourced from the BPS NTT Province website. The results of the Hausman test show that the right model is a Random effect. The spatial dependence test concludes that there is a spatial dependence, and the poverty modeling in the Flores Islands tends to use the SAR model. SAR random effect model R 2 shows a value of 77.38 percent, and it does not meet the assumption of normality. The Spatial Autoregressive Random effect model with the Fast Double Bootstrap approach is able to explain the diversity of poverty rate in Flores Island by 99.83 percent and fulfilling the assumption of residual normality. The results of the analysis using the FDB approach on the spatial panel show better results than the common spatial panel.


INTRODUCTION
Panel data is a combination of cross-sectional and time series data. Time series data is the result of observations of one or more variables over a period of time. Cross-sectional data is the result of observations of one or more variables taken from sample units or subjects at the same period. In panel data, the same individual units are observed from time to time [1]. Spatial regression analysis was first introduced by Jean Paelinck in 1970 and was later developed by Anselin [2]. Spatial panel analysis is an analysis affected by the space or location effects. The effect or location effect on spatial analysis is presented in the form of weighting. The measure of proximity depends on the information on the size and shape of observation units, as illustrated on the map [3].
In spatial regression, spatial dependency is an absolute requirement for this analysis. Methods for testing the spatial dependency include: Moran's I test, Lagrange Multiplier (LM), Likelihood Ratio (LR), and Rao's Score. Moran's I test is the commonly used test since this test does not assume an alternative hypothesis, but it is able to test the spatial lag dependence and the spatial error dependence [4].
In 1979, Efron introduced the computer-based bootstrap method as an alternative empirical problem solving. This method is more accurate than the asymptotic method under conditions of small samples and unknown parameter distribution [5]. In 1988, Beran developed the double bootstrap method with better performance than the common bootstrap method. The basic principle of the double bootstrap method is that from the first stage of the B1 bootstrap data set, resampling is carried out as much as B2 replication in the second stage [6]. The weakness of double bootstrap method is its longer calculation time because to obtain the test statistic value, it has to calculate resampling as many as used in the first stage, and then it is added with resampling in the second stage. Then, the test statistics of the bootstrap data set in the first stage and the second stage are treated as independent, so each bootstrap data set in the first stage is replicated one time in the second stage bootstrap. This method produces the same level of accuracy as the double bootstrap method but requires a much shorter processing time. This method is called fast double bootstrap (FDB) [7].
A spatial bootstrap test based on Ordinary Least Square (OLS) according to Moran's I statistics is used to test the spatial correlation of the model. The fast double bootstrap method results in better Moran's I statistical test value and better asymptotic assumption test value, with a research focus, namely Boostrap Moran's I [4]. The study used the bootstrap method for LM tests (including LM-lag tests and LM-error tests) and for spatial dependence in panel data models with fixed effects. The consistency of LM testing and bootstrap versions is proven, and there are some asymptotic improvements of bootstrap LM testing [8].
The bootstrap method has advantages in that the bootstrap does not require assumptions about data distribution or does not need to assume independent errors and normally distributed terms [4]. The use of spatial panel data with the fast Double Bootstrap approach cannot be directly applied to the residuals; it needs a cluster approach. The cluster approach used in panel data is a cluster based on time and a cluster based on series. The bootstrap cluster method is often used in panel data research, where the bootstrap cluster method works very well in practice on panel data [9]. Different bootstrap methods have been developed for different types of regression. Such as residuals bootstrap, block bootstrap [5], wild bootstrap, wild cluster bootstrap [10]. The block bootstrap is used in time series models. The wild bootstrap is used to deal with panel data models and heteroscedasticity [11]. The pairs bootstrap is used for to dynamic model or the heteroscedastic model in which the error term is unknown distributed. The sub-cluster wild bootstrap is a family of new methods that includes the ordinary wild bootstrap as a limiting instance. The latter technique can perform very effectively in pure treatment models, where all observations within clusters are either treated or not. The most important criterion.
Research on spatial data panels focused on only Moran's I bootstrap and Bootstrap LM test [4] [12]. This study uses the Fast double bootstrap method approach with faster work, namely each set of the first stage bootstrap data is enough to do one replication on the second stage bootstrap.
This study used a spatial autoregressive panel data model with the fast double bootstrap (FDB) approach to the poverty rate data in the Flores Islands. Poverty modeling in the Flores Islands has too small a sample unit. In addition, the use of panel data with the time series of 2018 -2020 has also not been able to produce a large number of observations. This condition will bring problems in testing the spatial dependence because of small samples and residuals that are not normally distributed. In general, statistical inference is based on the assumption of an asymptotic normal distribution with reference to the Law of Large Numbers and the Central Limit Theorem. In small samples, the accuracy of the estimator resulted from the maximum likelihood estimator (MLE) method and the Ordinary least square (OLS) method is not good [13]. If the sample size is not large, the asymptotic behavior of the statistics leads to a poor estimate of the actual. By using the bootstrap method, under some regularity conditions, it is possible to obtain a more accurate estimator distribution than the common statistical distribution. To overcome this problem, the fast double bootstrap (FDB) method is utilized. By using the double bootstrap (FDB) method, under some regularity conditions, it is possible to obtain a more accurate estimator distribution than the common statistical distribution [12].
According to Hounkannounon if the sample size is not large enough, the asymptotic behavior of such statistics leads to poor estimates of the real ones. Using the bootstrap method, under some conditions of regularity, it is possible to obtain a more accurate estimator spread than the usual statistical distribution [14]. Double bootstrap is a procedure to calculate the bootstrap p-value value, which is much more computationally efficient than the bootstrap itself. In many cases, it can provide more accurate results than the usual bootstrapping approach [15].

RESEARCH METHODS
The data used in this study were data from the Central Statistics Body (BPS) of the East Nusa Tenggara Province for the 2020-2022 period. The areas used in this study were all districts in the Flores Islands. The dependent variable was poverty, the independent variable was expected years of schooling (HLS), GRDP, life expectancy (AHH), district minimum wage (UMK), and unemployment rate (TPT). The data were tested using a spatial autoregressive model (SAR) with a fast double bootstrap (FDB) approach. The steps of the analysis were first, determining the spatial weighting matrix based on Rook contiguity and normalizing the rows to get the matrix (W). Then, choose the right model between Fixed effect and Random effect using the Hausman test.
Next, conducting spatial dependency tests, which was Morans'I and LM tests with initial data and the residual approach of pooled effect of model Morans'I test with residual approach [4].

Panel Data Autoregressive Spatial Model
Estimation of the spatial regression parameters of this panel data assumed that the W matrix was constant over time and the data used waws balanced panel data [17] [2]. The models that can be formed were the spatial autoregressive fixed effect model and the spatial autoregressive random effect model.

Fast Double Bootstrap LM lag
The Lagrange Multiplier lag test can be used to test the spatial dependence between regions on the dependent variable. The Lagrange Multiplier lag test using the fast double bootstrap (FDB) approach was developed from the LM lag test statistics. Fast Double Bootstrap LM lag value is stated in Equation (10 where = 1,2, …, . is the LM lag value of the original data. The hypotheses used were: H0 : = 0 (no spatial lag dependence in the model) H1 :  0 (there is spatial lag dependence in the model) If p-value of Lagrange Multiplier (LM) lag of the fast double bootstrap approach is less than the significance level α, then H0 is rejected.

Fast Double Bootstrap SAR Model
Spatial autoregressive model (SAR) with fixed effect and random effect methods produced a residual data set ( ). The residual data set bootstrap was carried out in two stages to obtain fast double bootstrap replication (ˆ* The estimator coefficient value of the spatial lag autocorrelation using the fast double bootstrap (FDB) approach was obtained through an iteration process for each replication.

RESULTS AND DISCUSSION
Data exploration is carried out to provide an overview and useful information from the data without drawing conclusion in general. The weighting matrix used was Rook Contiguity, followed by the Hausman test to obtain the selected model.  Table 1 shows that the Hausman tests for SAR and SEM obtained p-value > α (0.05), and therefore, H0 is accepted, and it can be concluded that the random effect model is better than the fixed effect model in modeling the poverty rate in the Flores Islands.
After the model was selected, the next step was to test the spatial dependence with Moran's I test and the Lagrange Multiplier test. In testing the spatial effect with Moran's I test, it was found that there was a spatial effect among locations with Moran's I value (0.3493) > I0 (-0.1428). It shows a spatial dependence or spatial effect on the percentage of poverty rates between districts in the Flores Islands. In the Lagrange Multiplier (LM) test, the results are as follows: Based on Table 2, the LM lag test resulted in a p-value of 6.725e-04. This indicates that there is a spatial dependence in the spatial lag model of panel data. Robust LM lag test obtained p-value of 0.000423. This shows that the spatial panel model of the poverty rates in the Flores Islands tends to be the Spatial Autoregressive model (SAR).
Then, the spatial autoregressive random effect test was carried out. The coefficient of determination ( 2 ) generated was 0.7338, with the assumption that the residual normality of the random effect spatial model was not met.
The next step was spatial panel testing with the cluster fast double bootstrap (FDB) approach. Spatial dependence test with fast double bootstrap (FDB) approach obtained: The next step was testing the spatial autoregressive random effect model with the FDB approach. The spatial value of the autoregression model with the FDB approach was: According to Table 6 and based on the coefficient of determination (R 2 ) obtained that 98.93 percent of the poverty rates in the Flores Islands can be explained by the five independent variables using the spatial autoregressive random effect with the FDB approach. The variables which significantly affected the dependent variable in the SAR random effect model used = 5%, included GRDP, life expectancy (AHH), and district minimum wage (UMK). This was indicated by the p-value < = 5%. SAR random effect model using the FDB obtained was: − 0.0471 X 2t + 4.4302 X 3t + 0.2472 X 4t + 0.365 Next, the assumption of residual normality is met at a fairly small observation size by using the FDB SAR random effect approach. The test results with the cluster fast double bootstrap (FDB) approach show better results and there is an improvement in the normality assumption test.
The prediction results of the cluster Fast Double Bootstrap approach can be seen in Figure 2.  Based on Figure 2, the estimated value of parameters obtained by the Fast Double Bootstrap Spatial Autoregressive random effect approach with looping 1000 times meets the normal distribution (limiting normal distribution).

CONCLUSIONS
The autoregressive spatial random effect model using the Cluster Fast Double Bootstrap approach results in a higher R-square value, 98.93 percent of the initial spatial autoregressive random effect model data. Spatial testing using the FDB approach shows better results and there is an improvement in assumptions of the small samples.
For future research, it is necessary to develop a statistical test of spatial dependence with the FDB approach that considers outlier data.