MODELLING OF POVERTY PERCENTAGE IN EAST JAVA PROVINCE WITH SEMIPARAMETRIC REGRESSION APPROACH

Article History: Poverty is an economic problem faced by all countries in the world, including Indonesia. Poverty is seen as the inability of a person from an economic standpoint to meet basic food and non-food needs as measured from the expenditure side. East Java Province is used as the object of research because this province has had the highest economic growth in Java Island after DKI Jakarta province in the last 5 years. However, East Java is also included in the province with the highest number of poor people on the island of Java. Several independent variables that are thought to influence the percentage of poverty in East Java are the Open Unemployment Rate (TPT), Life Expectancy Rate (AHH), Average Years of Schooling (RLS), Population Density, and GRDP Rate. Sources of research data come from the East Java BPS website and East Java Open Data. Data analysis was performed using a semiparametric regression approach. The results of the analysis obtained good performance values, namely the MSE value of 12,2156 and the R 2 value of 98,71%.


INTRODUCTION
Poverty is a macroeconomic conflict faced by all countries in the world, including Indonesia [1]. The source of the problem of poverty in Indonesia is the high disparity between regions caused by the uneven level of income distribution [2]. This creates a widening gap between the rich and the poor, so poverty reduction is a major concern for human development [3]. According to [4], poverty is seen as an economic inability to meet basic food and non-food needs as measured from the expenditure side. Poor people are residents who have an average expenditure per capita per month below the poverty line.
Indonesia is a vast archipelagic country consisting of several provinces with the characteristics of people with different economic conditions. So, to solve the problem of poverty in Indonesia, it must start from the lower level, starting at the district, provincial and national levels. East Java was chosen as the focus of this research based on several reasons. First, in the government's official publication through the Central Bureau of Statistics [5], in the last 5 years, East Java has been the province with the highest economic growth on the island of Java after the province of DKI Jakarta. But on the other hand, the number of poor people in East Java in 2019 was 4.112 million people, which means that East Java is the province with the largest number of poor people on the island of Java. Even though, according to data from BPS, economic growth in East Java has continued to increase from year to year, this increase in economic growth has not been matched by a significant reduction in the poverty rate. Second, the population always increases every year without being matched by an even distribution of population distribution. So far, most of the population is still concentrated in Java. Data from the 2020 BPS census shows that 151,591,262 people out of Indonesia's total population of 270,203,917 live on the island of Java. Of this number, 40,665,696 residents live in East Java Province, the second province with the largest population after West Java.
Unemployment has several types, one of which is open unemployment. Open unemployment defines as part of the workforce that is currently not working and is actively looking for work. Meanwhile, Adzim [6] explains that the open unemployed are those who do not have a job because they are looking for work, preparing for a business, or because they feel it is impossible to get a job, as well as those who already have a job but have not started working yet. To measure the percentage level of unemployment in a region, the open unemployment rate is generally used. The open unemployment rate is the percentage of the number of unemployed to the total labour force.
Life expectancy at birth is the average number of years a new born will live in a given year. Life expectancy is a tool to evaluate the government's performance in improving the welfare of the population in general and improving health status in particular. Low life expectancy in an area must be followed by health development programs and other social programs.
The average length of schooling (RLS) is a number that describes the length (years) of school experienced by people aged 25 years and over. The Central Bureau of Statistics suggests that RLS is defined as the number of years used by the population to undergo formal education. RLS can be used to determine the level and quality of community education in an area, while HLS is defined as the length of schooling (years) that is expected to be experienced by children at a certain age in the future. In addition, there are several factors that cause children to drop out of school, such as environmental factors, understanding of the importance of education, culture, availability of educational facilities/infrastructure, and others. This is in line with states that children who drop out of school are generally unable to pay tuition fees and do not receive information about scholarships, both regarding scholarship sources and how to access them, so the response is to build a Scholarship Village [7].
A large population can move the market from demand through a multiplier effect due to aggregate demand. Population in the economic development of a region is a fundamental problem because uncontrolled population growth can result in failure to achieve economic development goals, namely community welfare and poverty alleviation [8]. Population density is a condition that is said to be denser if the number of people in a certain space is more than the area of the room [9]. Population density is the ratio between the number of residents and the area inhabited [10]. The increase in population density was influenced by the number of residents, which continued to increase yearly. Population density accompanied by a high accumulation of human capital will encourage increased economic activity. The skills and knowledge possessed by individuals will affect their performance, people with high human capital will be able to produce new technological ideas that can encourage increased output.
GRDP is income generated through goods and services by all economic activities in an area in a certain period. The higher the GRDP in a region, the greater the level of revenue for the region. However, GRDP does not guarantee that all residents enjoy prosperity. GRDP is only an overview of people's welfare. The increase in GRDP cannot be concluded whether the condition of the low-income population has improved or not [11].
Based on the description above, this study will apply semiparametric regression estimation using the Fourier series estimator on the poverty percentage data for East Java Province on several factors, namely the Open Unemployment Rate (TPT), Life Expectancy Rate (AHH), Average Years of Schooling (RLS), Population Density, and GRDP Rate.

RESEARCH METHODS
The research method contains explanations in the form of paragraphs about the research design or descriptions of the variables, data sources, and data analysis.

Variables and Data Sources
The data used in this study is secondary data consisting of the variables Percentage of Poor Population, Open Unemployment Rate (TPT), Life Expectancy Rate (AHH), Average Years of Schooling (RLS), Population Density, and GRDP Rate in East Java in 2021. Types of data for each variable and data source can be seen in the following table: 1. The percentage of poverty is the percentage of the population below the poverty line (GK). GK is the sum of the Food Poverty Line (GKM) and the Non-Food Poverty Line (GKNM). GKM is the value of spending on minimum food needs which is equivalent to 2100 kilo calories per capita per day. GKNM is the minimum requirement for housing, clothing, education, and health. 2. The Open Unemployment Rate (TPT) is the percentage of the number of unemployed to the total labour force. It is said to be unemployed if someone does not have a job and is looking for work, someone does not have a job and is preparing for a business, someone does not have a job and is not looking for work because they feel it is impossible to get a job, someone already has a job but has not started working. 3. Life Expectancy at Birth (AHH) is defined as the average number of years that a person can live from birth. AHH reflects the degree of health of a community. 4. The average length of schooling (RLS) is defined as the number of years spent by the population in formal education. It is assumed that under normal conditions, the average length of schooling in a region will not decrease. The population coverage calculated in calculating the average length of schooling is the population aged 25 years and over. 5. Population density shows the number of inhabitants per square kilometre of area. The intended area is the total area of land in an administrative area. 6. The GRDP rate shows the growth in the production of goods and services in an economic area within a certain time interval.

Analysis Technique
This study uses a semiparametric regression approach, which is a combination of parametric regression and nonparametric regression, to determine the factors that are thought to influence the percentage of poor people in Indonesia. The parametric approach that can be used is the linear regression method. The parametric approach is chosen when the regression curve forms a certain pattern, for example, linear. Meanwhile, the nonparametric approach can be an alternative when the regression curve is unknown or does not have a certain pattern. One of the estimators that can be used in a nonparametric approach is the Fourier Series. The steps in this research are as follows:

Statistics Descriptive
To find out the characteristics of the data on the response variables and predictors used, it can be seen from the results of the descriptive statistics in Table 3 below: In the table above, it can be concluded that the poverty percentage variable (Y) has an average of The next descriptive analysis is to make a scatter plot. This is done to identify the pattern of the relationship between the response variable and each predictor variable. Furthermore, it can be used to determine variables as parametric components and nonparametric components. Figure 1 shows the shape of the relationship pattern between the Poor Population Percentage variable and the predictor variables, namely TPT (X1), AHH (X2), RLS (X3), Population Density (X4), and GRDP Rate (X5).

Figure 1. Scatter plot results between response variables and predictors
From Figure 1, it can be seen that relationship between TPT, AHH, and RLS forms a pattern that tends to be linear and inversely proportional to the percentage of poor people in East Java Province. This shows that the higher the TPT, AHH, and RLS factors, the lower the percentage of poor people. While the relationship between the variable population density and GRDP figures for the percentage of poor people does not form a linear (non-linear) pattern. The results of the scatter plot identification are presented in Table  4. The relationship pattern between the variable percentage of poor people and the TPT, AHH, and RLS variables appears to form a linear pattern. From the results of the scatter plot, the TPT, AHH, and RLS variables are approached with parametric regression. Meanwhile, the pattern of relationship between the variable percentage of poor people and population density and GRDP rate tends not to form a specific pattern. This unknown pattern of relationships makes solving this problem a nonparametric regression approach.

Semiparametric Regression Method
To get the best estimator in the semiparametric regression model with the Fourier Series approach, the optimal is selected using the GCV method. The value of is chosen based on the minimum GCV value. The value of is the number of cosine wave oscillations in the regression model with the Fourier Series approach. The greater the value of , the more complex the model and the tighter the oscillations of the regression curve. For this reason, the value of is limited to = 5. Table 5 is a table of GCV gains for values of = 1, 2, 3, 4, and 5.  ̂=̂0 +̂1 1 +̂2 2 +̂3 3 +̂0 2 +̂1 1 +̂1 1 cos 1 +̂2 1 cos 2 1 +̂2 2 +̂1 2 cos 2 +̂2 2 cos 2 2 Based on the estimator Table 6, a semiparametric regression model with the Fourier Series approach for the Percentage of Poor Population in East Java Province is obtained as follows: ̂= 20,08 + 0,08 1 − 0,01 2 − 2,37 3 + 9,7 + 0,14 1 − 0,1 cos 1 −0,23 cos 2 1 − 0,47 2 + 0,55 cos 2 + 0,48 cos 2 2 This model has a value of MSE = 12,2156 and minimum GCV = 369167,4. These values are smaller when compared to other K values. The coefficient of determination or 2 that this model has is high, namely 0,9871. This means that the diversity of response values has been able to be explained by the predictor variable of 98,71%.

Interpretation
The resulting model will present several model interpretations of the variables in the parametric regression. The following describes the interpretation of the model and description of the behavior of several variables.

1.
The interpretation of the TPT variable (X1) has a value of 0.8 which means that if there is an increase in TPT by one percent, the poverty percentage value will increase by 0.8 percent. This is in accordance with research conducted by [12], which states that the results of the interpretation regarding the relationship between TPT and the percentage of poor people show a positive value, meaning that the percentage of poor people increases when the TPT variable increases. The fact that the high percentage of poor people is accompanied by a high unemployment rate can be caused by a lack of employment opportunities and the low quality of human resources. Increasing the TPT variable can have a direct impact on poverty, so it is necessary to increase skills and expand employment opportunities to reduce this variable so that poverty decreases.

2.
The coefficient of the AHH variable (X2) has a value of -0.01, which means that if there is an increase in AHH by one percent, the poverty percentage value will decrease by 0.01 percent. This study resulted in the variable life expectancy showing a negative sign on the percentage of poverty. The higher the life expectancy, the higher the quality of public health, the higher the productivity level. An increased level of community productivity can encourage economic growth, which can ultimately reduce poverty, meaning that the higher the life expectancy, the lower poverty will be [13].

3.
The interpretation of the RLS variable (X3) has a value of -2.37 which means that if there is an increase in TPT by one percent, the poverty percentage value will increase by 2.37 percent. This has been explained by [14] and [15] that education plays an important role in tackling and reducing poverty in the long run, either indirectly through productivity improvements, or directly through training the poor with the skills needed to increase their productivity and in time will increase their income.

CONCLUSIONS
Based on the results of research and data analysis using simple linear regression techniques that have been carried out in this study, several conclusions can be drawn, namely: 1.
In the descriptive analysis, it was found that the poverty percentage factor (Y) had an average of 11.