ESTIMATION OF A BI-RESPONSE TRUNCATED SPLINE NONPARAMETRIC REGRESSION MODEL ON LIFE EXPECTANCY AND PREVALENCE OF UNDERWEIGHT CHILDREN IN INDONESIA

Article History: Researchers use the nonparametric regression method because it provides excellent flexibility in the modeling process. Nonparametric regression procedures can be used if the relationship pattern between the predictor and response variables is unknown. The truncated spline method is one of the most frequently used nonparametric regression methods. A truncated spline is a polynomial slice with continuous segmented properties, and the resulting curve is relatively smooth. The advantage of truncated splines is that they can be used on data that experience behavior changes at specific intervals. The nonparametric spline truncated bi-response regression approach is used when one or more predictor variables affect the two response variables with the assumption that there is a correlation between the response variables. This study aimed to obtain the best spline truncated bi-response nonparametric regression model on life expectancy data and the prevalence of underweight children in Indonesia in 2021. The data used comes from the Central Bureau of Statistics and the Indonesian Ministry of Health. The optimal knot point selection method uses the Generalized Cross Validation (GCV) method. The results showed that the best model formed was obtained using three-knot points based on a minimum GCV value of 22.77 and a coefficient of determination of 99.58%.


INTRODUCTION
Regression analysis is one of the statistical methods used to determine the effect of the predictor variable on the response variable.The main goal of a regression analysis is to find an estimate of the regression curve [1].There are three approaches commonly used to estimate the shape of the regression curve, namely the parametric approach, the nonparametric approach, and the semiparametric approach.The parametric regression approach is used when it is assumed that the shape of the regression curve of data is known to be in condition [2].The nonparametric regression approach is used when considering the form of the regression curve of an unknown data shape.A semiparametric method is used if the pattern of the regression curve is partly known and partly the way is strange [3].
Truncated splines have helpful statistical properties and should be considered a method for analyzing regression relationships [4].The truncated spline approach is a regression model with an exceptional and reasonable interpretation of visual statistics because there are several advantages in estimating the regression curve [5].A truncated spline is a function built on polynomial and truncated components, a slice of a polynomial with vertices, which can deal with changing data behavior patterns.The truncated spline approach is used to overcome the problem of data analysis modeling; namely, the relationship between the response variable and the predictor variable does not follow a specific pattern.There is a change in the way at certain subintervals.Certain sub-intervals can be used to overcome or reduce data patterns that experience a sharp increase with the help of knot points.The knot point is a fusion that shows the pattern of behavior of the truncated spline function at different intervals, with the presence of nodes, the resultingmodel will follow a relationship pattern the suits the behavior of the data [6] [7].
In real cases, we often face the problem that two or more dependent variables are observed in several independent variable values, and there is a correlation between the responses.As one of the methods to overcome this problem, the multiresponse nonparametric regression model is to model the function that represents the association of the variables used [8].In this study, researchers used nonparametric bi-response spline truncated regression analysis.Nonparametric bi-response truncated spline regression is a regression model with two response variables.Regression analysis, especially in the biresponse truncated spline nonparametric regression, requires an appropriate parameter estimation method.Several techniques can be used to estimate parameters, including the Ordinary Least Square (OLS) method [9].The principle of the least squares method is to minimize the sum of the squared errors.One of the methods that can be used in nonparametric regression using two response variables with the same principle as the OLS method is the Weighted Least Square (WLS) method with the help of a weighting matrix, namely the variance-covariance matrix [10] .Using the covariance variant matrix can produce accurate estimation results [11].
Health is an essential indicator in improving the quality of life, which supports human development [7] [12].In Indonesia, health problems are still a severe problem and a significant concern for the government.In this case, the government should consider combining policy interventions to reduce economic pressures, guarantee access to health care, and facilitate optimal health [13].A high degree of public health can indicate the success of health and socio-economic development programs, which can indirectly increase life expectancy [3].Life Expectancy is the average number of years of life that is still temporarily lived by someone who has reached a certain age in a specific year.There is an imbalance in the level of life expectancy.In general, rural areas have a higher life expectancy when compared to urban areas, which have a lower life expectancy [14].Based on the Central Bureau of Statistics, life expectancy in Indonesia in 2019 was 71.34 and continued to increase until 2021 to 71.57.A critical factor in achieving optimal health status is maintaining a good nutritional state [10].In developing countries, children under five (0-5 years) are most vulnerable to malnutrition.This nutritional status problem causes children to suffer from various infections and has low dietary status.One part of the prevalence of underweight toddlers that concerns researchers is malnourished.Being Underweight is a baby's failure to achieve ideal body weight, affecting height growth [15].Several countries are concerned about improving and supporting health progress, so health-related policies are carried out optimally, such as free health checks, socializing the importance of health, and providing adequate health facilities [16][17].

Regression Analysis
Regression analysis is a method that studies a relationship between response variables and predictor variables.There are three models of approaches based on the regression curve, namely the nonparametric regression approach, the semiparametric regression approach and the parametric regression approach [1].The model for can be written in Equation (1): : error toi

Multivariable Truncated Spline Bi-response Nonparametric Regression
Nonparametric spline truncated regression analysis if there is one response variable and one predictor variable, then it is called univariable spline truncated nonparametric regression.Nonparametric spline truncated regression analysis if there is one response variable with more than one predictor variable, then the regression is called multivariable spline truncated nonparametric regression [18].Truncated spline nonparametric bi-response regression is defined as a nonparametric regression method that has more than one response variable and between these variables there is a strong correlation or relationship, both logically and mathematically [7].The model for nonparametric spline truncated bi-response regression can be written in Equation (2): where is function f and g a regression curve of unknown shape and is approximated by a multivariable bi-response spline function with Equation (3): with: for value  and  is a parameter vector and an error vector which has the following elements [12]:

Estimation of Bi-response Spline Truncated Nonparametric Regression Models
The parameter estimation method in this study uses the Weighted Least Square (WLS) method, because this method is able to overcome correlations in the same observation subject [2].In the bi-response spline nonparametric regression using the Weighted Least Square (WLS) optimization method, it is assumed that there is a correlation between errors in the first response where to find the values of the variances as follows: ) ) The W weighting matrix can be written in Equation ( 12): (12) then to obtain the estimator, the completion of parameter optimization is carried out with WLS, for the sum of squared errors [3] is given by Equation (13): then Equation ( 14) obtained is derived from  write the result in Equation ( 15): (16) so that the estimated form of the spline model in bi-respon nonparametric regression [3] becomes:

Relationship Between Response Variables
The correlation coefficient is a value that measures the closeness of the relationship between two variables [19].The correlation value can be used Pearson Product Moment correlation coefficient which is written in Equation ( 18): Based on calculations with the Pearson correlation formula, a correlation coefficient value will be obtained, which based on this value can be seen the closeness of the relationship between the response variables that will be used [20].To test the hypothesis that there is a significant relationship between the first response variable and the second response variable, the correlation significance test is used with the formula: The area of rejection if value or p-value < , so 0 H rejected, which means that it can be concluded that the two variables are not mutually exclusive (correlated).

Optimal Knot Point Selection
Theoretically, the GCV method has the asymptotic optimal property shown which is not shared by other methods.Another advantage of the GCV method is that it is invariant to transformation and in calculating the population variance does not need to be known [6].Selection of optimal knot points using the GCV method selects the function that has the minimum GCV value.The GCV method is generally defined:

Best Model Selection Criteria
Determination coefficient value 2 () R is one of the quantities used to measure the feasibility of the regression model and a measure of the accuracy or accuracy of the regression model which shows the magnitude of the contribution of the predictor variable to changes in the response variable [11].A good model is a model that has value 2 () R high.Value 2 () R obtained is obtained from Equation ( 23):

Research Variables
The method used in this study is a bi-response spline nonparametric regression with optimal knot point selection using the Generalized Cross Validation (GCV) method.The aim is to obtain the best estimate of the bi-response nonparametric regression model using a truncated spline approach to life expectancy data and the prevalence of underweight children in Indonesia.The data used is life expectancy data ( 1 ), majority of underweight toddlers( 2 ), Percentage of poor people ( 1 ), Percentage of babies who are breastfed aged 0-5 months( 2 ), Percentage of children aged 12-23 months who received complete primary immunization( 3 ), and percentage rate of household members who have health insurance( 4 ) in 2021 obtained from the website of the Central Statistics Agency and the Ministry of Health.The research data used covers all provinces in Indonesia, namely 34 provinces spread across Indonesia as of 2021

Stage of Analysis
The steps of analysis with nonparametric spline truncated bi-response regression are as follows: 1. Perform correlation testing between response variables.

Knowing the pattern of data between predictor variables on the response variables
3. Modeling the expected life expectancy and prevalence of underweight children using a linear truncated bi-response spline nonparametric regression approach for 1 knot, 2 knots, and 3 knots.
4. Selecting the optimal knot point using the Generalized Cross Validation (GCV) method and the coefficient of determination for the best model 5. Modeling the expected age and prevalence of underweight children with the best model formed

Correlation Testing
One of the requirements in conducting nonparametric spline truncated bi-response regression modeling is that the first response variable and the second response variable must have a strong relationship in theory and through statistical testing.The relationship between life expectancy and the prevalence of underweight children can be significantly shown by correlation testing.The correlation test hypothesis is written as follows: Based on Table 1, it can be seen that the p-value = 0.00 so it is concluded that H0 is rejected or it is known that there is a significant relationship between life expectancy and the prevalence of underweight children, so that nonparametric spline truncated bi-response regression modeling is carried out.The correlation value of -0.64 means that the relationship between life expectancy and the prevalence of underweight children has a relatively strong level of cohesiveness, which means that the variable life expectancy and the majority of malnourished children have a close relationship or give each other a link.

Optimal Knot Point Selection
Selection of the optimal knot point with the GCV method by having a minimum GCV value.Minimum GCv values at 1-knot points, 2-knot points, and 3-knot points can be seen in Table 2 below.Intermediate data pattern.Based on Table 2. it is found that the selection of the best model is the one with the minimum GCV value obtained is 2.77 namely in the model with 3-knot points with a determination coefficient value of 99.58% and a MAPE value of 5.63% which is the minimum.the optimal knot point can be used as the best model is a nonparametric bi-response spline truncated regression model with 3-knot points.The coefficient of determination at the three-knot points in Table 2 means that as much as 99.58% of the variation that occurs in Life Expectancy and prevalence of underweight toddlers is influenced by the percentage of poor people.the rate of babies who are breastfed aged 0-5 months.the percentage of children aged 12-23 month of receiving complete primary immunization.the percentage of household members having health care insurance.At the same time.for the remaining 0.42%.the variation that occurs in life expectancy and prevalence of under-fives is influenced by other factors that have not been studied.The MAPE value of 5.63%.which is less than 10%.indicates that the prediction results from the regression model are very accurate and reasonable for predicting future discounts.

Best Model of Bi-response Spline Truncated Nonparametric Regression
The location of the optimal knot point in the nonparametric spline truncated bi-response regression model of 3-knot issues based on the minimum GCV value can be seen in Table 3.Based on the results obtained in Table 3. it can be concluded that the optimal knot point at 3-knot points for the predictor variable used with parameter estimation obtained based on the optimal knot point for the bi-response spline truncated nonparametric regression model is accepted in Table 4. Based on the selection of optimal knot points.a truncated bi-response spline nonparametric regression model with 3-knot points is the best model that can be used to model life expectancy and the prevalence of underweight children in Indonesia.The best model for the life expectancy variable with nonparametric bi-response spline truncated regression with three-knot points can be written in Equation (1): The best model for the variable prevalence of underweight toddlers with nonparametric bi-response spline truncated regression with 3-knot points can be written in Equation (2): This model can be used to determine the prediction results on the first response variable and second response variable based on the best model in this case life expectancy and prevalence of underweight children using a nonparametric spline regression approach.This can be used as a consideration in making decisions in the future.

CONCLUSIONS
The conclusion that can be drawn from the research results is that the best nonparametric truncated bi-response spline regression model is obtained with three-knot points.The resulting GCV value is 22.77 and the coefficient of determination is 99.58%.The bi-response truncated spline nonparametric regression model is as follows: 1. Model for the life expectancy variable The suggestions that the authors can convey from this study are that future research can use other research data.such as longitudinal data.and future research can also use nonparametric spline regression with different basic approaches.such as b-spline.p-spline.and so on.

k
of the regression parameter of the polynomial function, the first response of the truncated spline regression parameter of the first response variable, where k is the knot point 1, 2,..., kr = hj  : the coefficient of the regression parameter of the polynomial function, the second response variable where 1, 2,..., hm = ; m k j  + : the coefficient of the second response variable spline truncated regression parameter, where k is The regression model in Equation (2) and Equation (3) can be written in the following form of matrix notation:

19] [9].
homogeneous but the error variance value for each response is not the same, so the variance-covariance () matrix is used.Determination of the weighting matrix is by calculating the value of the covariance variance of the first response variable and the second response variable [