PANEL DATA REGRESSION MODEL FOR PREDICTING ECONOMIC GROWTH BEFORE AND DURING THE COVID-19 PANDEMIC IN EAST JAVA PROVINCE

ABSTRACT

Economic Growth; Fixed Effect Model; GRDP; Panel Data Regression.

INTRODUCTION
Gross Regional Domestic Product (GRDP) is one of the main indicators to determine the economic condition of a region in a certain period [1][2], based on current prices or constant prices.According to data from the Central Statistics Agency (BPS), in 2021 the GDP contribution of the island of Java will be around 59% [1].The basic sectors of regional economic activity can be analyzed using Location Quotient (LQ) with the Gross Regional Domestic Product (GRDP) variable as an indicator of regional growth [3].The Covid-19 pandemic has had a significant impact on many sectors in most countries in Asia, America, Africa, Europe, and Australia [4].Limited community activity resulted in a decrease in production and capital outflow which had an impact on the weakening of the rupiah exchange rate [5].
Panel data models, both fixed and random effects models, are highly reliable for multivariable longitudinal data sets.Panel data regression models are used for business forecasting and to determine the relationship between attributes in a data set [6].Panel data regression is a combination of time series data and cross section data [7] [8].The panel data regression model is applied to determine the variables that influence economic growth in Asian countries.The results of this research show that taxes, government debt, household consumption, and credit interest rates have a significant effect on economic growth [7].Panel data regression was also used to model the percentage of poor people by district/city in East Kalimantan.Attributes that are thought to influence include population growth rate, human development index, and adjusted per capita expenditure [8].Panel data regression was also used to determine the attributes that influence the level of poverty in Madura [9].This research aims to predict East Java's economic growth before the pandemic (2019) and during the Covid-19 pandemic (2020) based on Gross Regional Domestic Product (GRDP) as the basic sector of economic activity using panel data regression.Apart from making predictions, this model is also used to determine the influence of differences in entities (districts/cities) and the influence of differences in observation periods (years).The variables or attributes used in modeling are the structure of the realization of the Regional Revenue and Expenditure Budget (RREB) which includes Regional Original Income (LGR), Transfers to Regional and Village Funds (TRVF), Other Income (OR), Labor Expenditures (EE ), Goods and Services Expenditures (GSE), Capital Expenditures (CE), Other Expenditures (OE), and Regional Fund Revenue (RFR).Panel data regression is a predictive analysis of time series data mining.
Panel data regression models are also applied in several cases, including: to predict the relationship between tourism and economic development in Southeast Asian countries [10], analysis of the influence of population aging on public debt [11], analysis of the influence of several attributes of bad credit, loans banks, inflation, and CO2 emissions on the economic growth of several countries [12], modeling consumption by considering the cross-section of each type of product category and household [13], and analyzing the increase in infant mortality rates sequentially to obtain several recommendations that support reducing infant mortality rates in East Java [14].This method is also applied to modeling the depth of poverty index in districts/cities in Papua Province.The Fixed Effect Model (FEM) was chosen as the best model with an R-square value of 82.5% [15].Panel data regression models are also used to analyze the impact of several e-commerce attributes on economic growth in Indonesia.The results of this research show that the e-commerce attribute that has a very significant influence is computer users [16].Panel data is also used to model income inequality based on economic growth attributes.The results of this study show that the attribute of economic growth has a significant influence on income inequality [17].Similar research was also carried out to model the percentage of poor people based on several attributes, namely: economic growth rate, average years of schooling, dependency ratio, and life expectancy.The results of this research show that the attributes that have a significant influence are the average number of years of schooling, dependency ratio, and life expectancy [18].
The advantages of panel data regression are the greater number of observations, reducing the occurrence of multicollinearity, prediction results tend to be more effective, and obtaining more information.[19].The panel data model is also very good at predicting poverty in Bengkulu Province with the Fixed Effect Model (FEM) as the best model.The significant attribute in this model is GRDP, with a Mean Absolute Percentage Error (MAPE) value of 6.59% and a Mean Square Error (MSE) of 5.48 [20].
Gross Regional Domestic Product (GRDP) is one of the main indicators to determine the economic condition of a region in a certain period [1][2], based on the current price or constant price.According to data from Badan Pusat Statistik (BPS), in 2021 the GRDP contribution of the island of Java will be around 59% [1].GRDP has a significant effect on economic growth [3].The Covid-19 pandemic has had a significant impact on many sectors in most Asian, American, African, European and Australian countries [4].Limited community activities resulted in decreased production and capital outflows, which resulted in a weakening of the rupiah exchange rate [5].
This research aims to predict East Java's economic growth before the pandemic (2019) and during the Covid-19 pandemic (2020) based on Gross Regional Domestic Product (GRDP) using panel data regression.Apart from predicting, it is also used to analyze the influence of differences in entities (districts/cities) and/or the influence of differences in observation periods (years).The variables or attributes used in modeling are the structure of the realization of the Regional Revenue and Expenditure Budget (RREB) which includes Original Regional Income (LGR), Transfers to Regional and Village Funds (TRVF), Other Income (OR), Labor Expenditures.(EE), Goods and Services Expenditures (GSE), Capital Expenditures (CE), Other Expenditures (OE), and Regional Fund Revenue (RFR).
Panel data regression is predictive analytics from time series data mining.Panel data models, both fixed and random effects models are very reliable for multivariable longitudinal datasets.This model is used for business forecasting as well as to find out the relationship between attributes in a data set [6].Panel data regression is a combination of time series data and cross section data The panel data regression model is applied to determine the attributes that influence economic growth in Asian countries [7].This regression model is also used to model the percentage of poor people by district/city in East Kalimantan.Attributes that are thought to influence include population growth rate, human development index, and adjusted per capita expenditure [8].Panel data regression was also used to determine the attributes that influence the level of poverty in Madura [9].Some other applications of panel data regression models include: to predict the relationship between tourism and economic development in Southeast Asian countries [10], analysis of the influence of population aging on public debt [11], analyzing the influence of several attributes of bad credit, bank loans, inflation, and CO2 emissions on the economic growth of several countries [12], consumption modeling by considering the cross section of each type of product category and household [13], and to analyze the increase in infant mortality rates in order to obtain several recommendations that support reducing infant mortality rates in Java East [14].This method is also applied to poverty depth index modeling in districts/cities of Papua Province.The Fixed Effect Model (FEM) model was chosen as the best model with an R-square value of 82.5% [15].The panel data regression model is also used to analyze the impact of several e-commerce attributes on economic growth in Indonesia.The results of this study indicate that the attributes of e-commerce that have a very significant effect are computer users [16].Panel data is also used for modeling income inequality based on economic growth attributes.The results of this study show that the attributes of economic growth have a significant impact on income inequality [17].A similar study was also conducted to model the percentage of poor people based on several attributes, namely: the rate of economic growth, the average length of schooling, the ratio of dependents, and life expectancy.The results of this study, the attributes that have a significant effect are the average length of schooling, the ratio of dependents, and life expectancy [18].
The advantages of panel data regression are the greater number of observations, reducing the occurrence of multicollinearity, predictive results tend to be more effective, and obtain more information.[19].The panel data model is also very good for predicting poverty in Bengkulu Province with the Fixed Effect Model (FEM) as the best model.The significant attribute in this model is GRDP, with a Mean Absolute Percentage Error (MAPE) value of 6.59% and a Mean Square Error (MSE) of 5.48 [20].

RESEARCH METHODS
This research methodology can be seen in Figure 1.The initial step is data collection and cleaning or what is often referred to as extraction, transformation, and load (ETL).Next, an initial analysis of the characteristics of the attributes used as response and predictor variables is carried out.The Variance Inflation Factor (VIF) test is used to ensure that multicollinearity does not occur in the attributes as predictors before modeling.The next step as the main process is panel data regression modeling.There are three regression models to be tested to select the appropriate model, namely the Common Effect Model (CEM), Fixed Effect Model (FEM), and Random Effect Model (REM).In the selected appropriate model, a heteroscedasticity test was carried out to ensure that the variance between groups of observations was homogeneous.Next, an evaluation of the model is carried out based on the values of the Mean Square Error (MSE), Mean Absolute Deviation (MAD), and Mean Absolute Percentage Error (MAPE).The final stage is to predict and analyze GRDP as a statistic to measure the economic level or aggregation of gross value added based on the structure of the regional revenue and expenditure budget realization in each regency/city in East Java Province.

Dataset
The data sources for this research are from the Central Bureau of Statistics for East Java Province and the Directorate General of Fiscal Balance of the Ministry of Finance, namely: which were: a) Gross Regional Domestic Product (GRDP) data based on current prices/constant prices according to business field of East Java Province in 2014 -2021.
b) Data on the structure of the Regional Revenue and Expenditure Budget for 2017 -2021 consists of: Locally Generated Revenue (LGR), Transfer to Regional and Village Fund (TRVF), Other Revenue (OR), Employ Expenditures (EE), Goods and Services Expenditures (GSE), Capital Expenditures (CE), Other Expenditures (OE), and Regional Financing Revenues (RFR).

Panel Data Regression
There are three panel data regression model approaches, namely the Common Factor Model (CEM), Fixed Effect Model (FEM), and Random Effect Model (REM).The common effect model is the same as Ordinary Least Square (OLS), all data are combined, both cross section data and time series data.The CEM equation is expressed in Equation (1).
where: In the Fixed Effect Model, the underlying assumption is that there are different effects between objects and periods.In this model, differences in the characteristics of objects and periods are accommodated in intercepts/constants so that their values change with respect to objects and periods.Fixed Effect Model used in this study is expressed in Equation (2). where: where:  If the BP -LM test turns out that H0 is accepted, then the best model is CEM.If H0 is rejected, it is necessary to do the Hausman test, which is to choose between FEM and REM.If H0 is rejected, then the best model is FEM.Conversely, if H0 is accepted then the best model is REM.

Descriptive Statistics
Descriptive statistics on the attributes in this study are shown in The average GRDP for 2017 -2020 by regency/city, ordered from largest to smallest, is shown in Figure 3.The city of Surabaya ranks first with an average GRDP of Rp. 364714.8 billion or almost 3 times the average GRDP of Sidoarjo as second place, which is Rp.134878.12billion.Regencies/cities with the third, fourth and fifth largest GRDP respectively are Pasuruan, Gresik, and Kediri City.Meanwhile, the 5 regencies/cities with the smallest average GRDP are Madiun City, Probolinggo City, Pasuruan City, Mojokerto City, and Blitar City.In Figure 5, the average GRDP of East Java Province from 2017 to 2019 has increased above Rp.1.200 billion per year or about 5%.However, due to the impact of the pandemic in 2020, the average GRDP fell by around Rp. 700 billion or 2.78%.The average economic growth in 2017 -2019 is between 4.45% -4.48%, and in 2020 it will decrease to -5.31%.The Scatterplot between GRDP and each RREB component is shown in Figure 6 and Figure 7.In general, all components have gradients that are positive with varying degrees of slope.In Figure 6, ignoring the time and area variables, partially the total variation that can be explained by LGR to GRDP is the highest, namely 92.9%.While the total variation of OR to GRDP is 56.6%.Ignoring the time and area variables, partially the total variation that can be explained by GSE on GRDP is shown in Figure 7 with the highest R-squared of 89.0%.The total variation of CE to GRDP is 73.1%.While the total variation of OE and RFR is very low, respectively 1.2% and 28.4%.

Multicolinearity Test
In the original dataset, the Variance Inflation Factor (VIF) value of the ordinary least squares (OLS) model by including all the attributes of the RREB components is shown in Table 2.In the original dataset, there are 5 attributes with VIF values > 10, which means that there is multicollinearity between attributes.For this reason, a transformation is carried out into the natural logarithm.In the transformation dataset, there are 2 attributes with a VIF value > 10.This condition is better and the strategy to be applied at the modeling stage is to use the transformation dataset but not to use all the attributes in the model.

Panel Data Regression: OLS Modeling
The first panel data regression modeling to try is the CEM or OLS model.This model ignores crosssection (regency/city) and time series (time variables).Several models with R 2 and R 2 -adj above 80% are shown in Table 3.Although 4 models have quite high R2 and VIF <10, only the OLS-1 model has all attribute parameters significant at  = 5%.For the OLS-2, OLS-3, and OLS-4 models, only 1 or 2 parameters are significant, respectively.The total variation of the OLS model -Dummy Variables, R 2 value: 99.99% and Adj.R 2 : 99.99%.At the significant level  = 5%, all attributes and dummy variables are significant in the model except for the LGR attribute with p-value = 0.47294 and the Nganjuk factor (dummy variable) with p-value = 0.14669.To determine whether the CEM or FEM model is selected, the Chow test is used.Chow test results on the selected OLS model (OLS-3) obtained F value = 0.19835 and p-value = 0.9389.The conclusion is that H0 is accepted.The next step is the BP -LM test.

Panel Data Regression: FEM Model
The parameter estimation of the FEM model based on the selected OLS model is shown in Table 4, with the predictor variables of the LGR, TRVF, and OR attributes.In this model, R 2 : 0.99991 and Adj.R 2 : 0.99987.At  = 5% level, the attributes of LGR, TRVF, and OR are not significant in the model.That is, these attributes have little effect on the FEM model.While the area factor, all districts/cities are significant in the model.The time factor constants from 2017 to 2020 are 9.7903, 9.8385, 9.8092 and 9.8316 respectively.In the BP -LM test, chisq = 940, df = 1, p-value < 2.2e-16.There is a panel effect on the data.

Panel Data Regression: REM Model
Parameter estimation of the REM model based on the selected OLS model is shown in Table 5, with the predictor variables being the LGR and OR attributes.The TRVF attribute is omitted by considering that the Random Effect Model dataset is only 4 years old which is only able to estimate a model with 3 parameters.On this model R-Squared: 0.78245.Adj.R-Squared: 0.77953.Based on individual testing with a = 5% level, the intercept along with the LGR and OR attributes are significant in the model.Hausman test results obtained chisq = 549.32,df = 2, p-value < 2.2e-16.So, it is concluded that H0 is rejected so that the best model is the Fixed Effect Model.

Heteroscedasticity Test
The next step is to ensure that there is no heteroscedasticity in the residual FEM model by testing the homogeneity of the residual variance using the Breusch-Pagan test.To check the basic assumptions of heteroscedasticity in Figure 8 is a plot between GRDP predictions and residuals.Some of the points are outside the limits of homogeneity requirements.In the Breusch-Pagan test, BP = 45.815,df = 40, p-value = 0.2436.The p-value is greater than a = 5%.So, it can be concluded that there is no heteroscedasticity in the residual variance and the FEM model can be used.

Model Interpretation
In the last stage is the interpretation of the model based on the prediction results using the best model, namely the Fixed Effect model.Figure 9 is the OR scatterplot against the GRDP prediction.The blue line shows the LGR regression line on the GRDP prediction with a slope value of exp (-0.0071) or 0.9929, which means that every increase in OR is Rp. 1 billion, there is a tendency for the average GRDP to increase by Rp. 0.9929 billion.The colorful lines in district/city data are straight-line approaches from the OR and GRDP points in each regency/city.Based on the prediction results, the average GRDP reduction in the predicted results is Rp.1.256.3 billion with a decrease range of Rp. 118.7 -Rp.12,907.8billion.The pandemic in the first year (2020) had a significant impact on economic growth in the East Java regency.The biggest decline was in the City of Surabaya (Rp.12.907.8billion), followed by Sidoarjo (Rp.3,642.75 billion), Gresik (2,907.35billion), Pasuruan (Rp.2.783.87), and the Kediri City (Rp.2,469.94 billion).This sequence is linear with the GRDP in the five districts/cities.This also applies to other regency/cities.The actual value and predicted value of GRDP in 2021 (second year of the Covid-19 pandemic) which is also used as test data is shown in Figure 12.The mean absolute difference between actual and predicted is Rp.857.3 billion with a range between Rp. 1.1 billion -Rp.12.247.5 billion.There are only 4 districts/cities or 10.5% with a residual (error) of less than Rp. 100 billion).The smallest residual is Banyuwangi of Rp. 1.1 billion, followed by Blitar and Jombang each with Rp. 1.5 billion and Rp.4.0 billion.The biggest residual is still the city of Surabaya, which is Rp.Rp. 12.247.5 billion.This is linear with Surabaya's GRDP which is always the highest from year to year.The actual and predicted values of economic growth in 3 regencies/cities from 2018 to 2021 are shown in Figure 13.The deviation of economic growth between actual and predicted in Banyuwangi (BWI) is between 0.07% -0.51%, Gresik (GRE) is between 0.04% -0.80%, and Surabaya City (SBY) between 0.64% -2.46%.In general, economic growth fell from 2018 to 2019.In 2020 economic growth fell drastically (negative) due to the Covid-19 pandemic, but in 2021 it increased significantly.
In modeling economic growth in Asian countries [7], the best model is the Fixed Effect Model (FEM) with R-squared 28.43% and Adjusted R-Squared 24.57%.Economic growth can be explained by the variables tax, government debt, household consumption and credit interest rates with a total variation of 24.57%.In the case of showing the percentage of poor people in East Kalimantan [8], the best model is the FEM model with an R-squared of 77.55%.In research [14], Adjusted R-squared was 80%.In [15], the best model is FEM with an R-square of 82.5%.The significant variables are the Human Development Index and average per capita expenditure.Meanwhile, in research [16] and [19] respectively, the Adjusted R-squared values were obtained at 40.93% and 45.49%.And in research [20], the MAPE value was 6.5%, MSE 5.5%, and RMSE 2.3%.In general, panel data regression models in previous studies obtained R-squared values below 90%.Meanwhile, in this research, FEM is also the best model with a very high R-squared value of 99.991 and Adjusted R-squared of 99.987% with a MAPE of 1.07%.What is interesting is that LGR, TRVF, and OR have less significant effects.In fact, the district/city factor has a very significant influence (a = 0.01%).

CONCLUSIONS
The results of this study, the Fixed Effect Model (FEM) is an appropriate model for predicting economic growth in regencies/cities in East Java.The MSE, MAD, and MAPE values of FEM are smaller than those of the OLS model with dummy variables.Cross section and period are significant to the GRDP value.What is interesting is that in the Fixed Effect model, some of the attributes that are modeled, namely Locally Generated Revenue (LGR), Transfers to Regional and Village Funds (TRVF), and Other Revenue (OR) are not significant in the model at the 95% confidence level, or it can be said to have little influence.However, all factors (cross section) have a very significant effect with a confidence level of 99.9%.This can be interpreted that local wisdom, or the characteristics of each district/city has a major contribution to economic growth.The selected model already has a very high Adjusted R-squared, namely 0.9999.That is, the total variation that can be explained by the components of the RREB structure to GRDP is very large.
The GRDP prediction results before the Covid-19 pandemic (2019) compared to the GRDP prediction results during the Covid-19 pandemic (2020) saw an average decrease of IDR.1,256.3 billion (Rp.118.7 -Rp.12,907.8 billion).The pandemic in the first year (2020) had a significant impact on economic growth in East Java Regency.If the panel data regression model in this case considers several other attributes such as household consumption, investment, exports and imports, a more precise, complete, and informative model will be obtained.

Figure 1 .
Figure 1.Methodology the i-th object in the t-period common mean value for the intercept the slope/parameter of the k-th variable the k-th predictor variable for the i-th object in the t-period error component for object i combined error cross section and time period cross section units(1, 2, …, N) time series units (1, 2, …, T) the number of attributes as a predictor variable Panel data regression modeling uses three models: CEM, FEM, and REM.The model selection algorithm can be seen in Figure 2. The Chow test is a test to choose between CEM and FEM.If the Chow H0 test is accepted, then the Breusch-Pagan -Lagrange Multiplier test (BP-LM test) is carried out.

Figure 2 .
Figure 2. Panel Data Regression Model Selection Algorithm

Figure 3 .
Figure 3. GRDP Average by Regency/City 2017 -2020 The distribution of each component of the RREB structure is shown in the boxplot in Figure 4.Besides having different variability, in each component there are 2 to 6 data outliers.

Figure 9 .
Figure 9. Scatterplot Prediction of GRDP and ORThe actual GRDP scatterplot against the predicted GRDP is shown in Figure10.If the observation points for each district/city are approached by the regression line, the line will almost coincide with the panel data regression line.

Table 1 .
Based on the RREB structure, the Transfer to Regional and Village Fund (TRVF) has the largest average contribution compared to other attributes, namely Rp. 1588.75 billion during 2017 -2020.The second and third largest are Employment Expenditures (EE) and Goods and Services Expenditures (GSE) with an average of Rp. 865.02 billion and Rp.606.11 billion.While the smallest is Other Revenue of Rp. 283.63 billion.The biggest variability is Locally Generated Revenue, namely Rp. 788.50 billion, followed by the second largest variability is TRVF, namely Rp. 609.61 billion.In general, the distribution of each component of the RREB is positively skewed mainly LGR, Goods and Services Expenditures (GSE), and Capital Expenditures (CE) and have various kurtosis.While the average GRDP is Rp.41.972.27 billion and a variance of Rp. 64.714.92 billion.GRDP variability in each regency/city in East Java Province is quite large.