MULTILEVEL NON-LINIER REGRESSION FOR REPEATED MEASURMENT DATA AS STUDY OF PEANUT GROWTH

. Peanut is one of the most important legume commodities in Indonesia. In its implementation, a lot of research has been done related to this plant. However, in studies conducted by growth models, it is very rarely studied. Therefore, researchers are interested in modeling the growth of peanuts. One of the models that can be used is a multilevel regression model for the case of repeated measurement data. Multilevel regression was chosen because it is considered to provide more information than other regression models. On the other hand, the nonlinear model was chosen based on the tendency of the initial plot of the data obtained. The research method used is a case study in the study of peanut growth. This study aims to build the best model based on the tested model. The Restricted Estimator Maximum Likelihood (REML) parameter estimation method was chosen because it is considered to have unbiased parameter estimates. The best model is based on the lowest Akaike Information Criterion (AIC) generated from a predetermined model. The results obtained indicate that the multilevel parabolic regression model is the model with the best AIC size. In addition, it was found that there was an Interclass Correlation (ICC) of 81.19% which indicated a difference in variability between levels.


INTRODUCTION
The rapid development of science in the last few decades is no doubt the result of human curiosity about the universe. Along with scientific developments, the need for analytical data also plays a role in analyzing every existing scientific study. Data analysis seems to play a fundamental role in research in various sciences, both scientific and social. One way to create superior research is to conduct research collaborations. This is very important to produce innovative researches by giving each other different scientific contributions in research. It is understandable that the development of statistics itself could not have been born without the problems that arise in relation to data analysis. Research collaboration seems to provide the key in developing problem-based statistics in other scientific fields. One of the scientific fields in question is in the field of agriculture. In agriculture, especially mathematics and statistics, it has important functions and roles, including as a communication tool for data producers and data lovers, as a tool or method for describing agricultural data, both with regression, correlation, and comparison methods [1]. The role of statistics in agriculture is very basic in order to produce compatible data analysis. In agriculture there are many things that can be studied, one of the interesting study materials is plant growth modeling.
Peanuts or in Latin Arachis hypogaea L., is one of the second most important legume commodities after soybeans in Indonesia [2]. Peanut is a staple commodity that is very valuable in Indonesia. Peanuts are a commodity that is rich in nutritional content of protein, fat, iron, vitamin E, vitamin B, vitamin A, vitamin K, phosphorus, lecithin, choline, and calcium. In his study, it was stated that peanut seeds contain 40-48% oil elements, 25% protein, and 18% carbohydrates and B complex vitamins [3]. In the field of agriculture itself has been found related to the analysis of production or agricultural products of peanuts. However, studies on growth or better known as growth models are still rarely carried out. It could be because of the economic factors from the results of growth modeling that are less attractive to researchers or in the implementation of models in the field that are rarely used. However, every problem about data becomes interesting to be discussed in statistical modeling. In some literacy, the growth model is generally carried out using a linear regression model approach, either simple or embellished by taking into account the factors that influence growth. However, the development of the complexity of the data structure directly becomes one of the interesting studies in the selection of the model to be used. The complexity in question is the discovery of a hierarchical or multilevel data structure, where each object under study basically has a growth model of each. Therefore, one of the analyzes that can be used is multilevel regression model analysis. Multilevel regression is characterized by a nested data structure. The data are characterized by nested membership relationships among observation units [4]. Thus the growth data for each collection of individual objects observed in the growth study can be modeled in a multilevel regression model.
Multilevel regression analysis is considered very full power. Multilevel regression methodological approach, researchers can analyze the relationships between variables on or at least two different levels of analysis [5]. In general, the equation in a multilevel regression model can be partitioned in two parts. The partition in question is better known as the fix effect (fixed effect) and random effect (random effect). The fixed effects section in the multilevel regression model includes multilevel regression coefficients and predictor variables, while the random effects section includes random parameters that include errors at each level formed. The two parts that make up the regression model equation are known as mix models. In line with this [6]. In an ordinary one-level regression model, the assumption is that all individuals, even if from different centres, belong to one common population. In a multilevel model, we consider that there may be genuine differences between center populations which are themselves a sample from a superpopulation [7]. In relation to mixed models related to regression models, an analytical model known as Generalized Linear Mixed Models (GLMMs) was also developed. The development of GLMMs contributed to the analysis carried out. The development of GLMMs gave birth to several estimation methods for further analysis, namely Gauss Hermite quadratur, Laplace Approximation, and Penalized Quasi-Likelihood (PQL).
The Penalized Quasi-Likelihood estimation method is very helpful in estimating the parameters in the multilevel regression equation model. The PQL method has developed theoretically until now it can be used. Penalized Quasi-Likelihood aims to obtain values that are useful for approaching parameter inference and the realization of random effects in multilevel models [8]. Along with scientific developments, the PQL method that is widely known today is a numerical procedure. In line with this, the procedures applied in the PQL method [9]. The first procedure is Iterative Generalized Least Square (IGLS) which in its development is considered to be biased towards the estimation of the variance value. To obtain an unbiased estimate of the variance value, the Resticted (residual) Maximum Likelihood (REML) estimation method is used. The REML method can at least greatly reduce the bias and even completely eliminate it in some situations [10]. Therefore, Goldstein offers a new iterative procedure known as the Restricted (residual) Iterative Generalized Least Square (RIGLS) which is unbiased in the estimation of the variance value.
Based on the things that have been written, peanut growth is an event that can be approached using a multilevel regression model for repeated measurement data. This can be seen from the data structure where if each plant is measured repeatedly, the plant measurement is at level-1 and the plant that is measured repeatedly is at level-2. The estimation method used is the REML method because this method is unbiased with respect to the estimated variance parameter. On the other hand, in the initial study of the growth model, it was found that the model to be estimated was non-linear. Therefore, the model approach is both quadratic, cubic, and logarithmic considering that the estimation procedure used is still based on the procedure in the estimation of the linear model. Thus, more and more models will be formed so that it is hoped that a much better model estimate will be obtained. Furthermore, the modeling procedure used using the forward selection method by taking into account the Akaike Information Criteria (AIC) value was chosen as the benchmark for selecting the best model formed. In addition, to see whether there is a correlation between treatment classes or what is commonly called Interclass Correlation (ICC). The intraclass correlation coefficient (ICC) is recommended for the assessment of the reliability of the measurement scale [11]. Correlation between classes refers to the variability within each level that is formed.

RESEARCH METHODS
In this research, the method used is a case study with a literature method approach. The method chosen was based on the context used, namely the acquisition of data on the growth of peanuts. The literature approach was chosen considering the context of multilevel regression modeling which is still rarely used, especially in non-linear multilevel regressions, namely polynomials and powers. The main focus in this research is the growth of peanuts. The number of samples used in this study were 125 plants and were taken randomly. The data used is primary data with the initial aim of knowing the impact of weeding and nonweeding from the treatment given. stepwise selection begins with a model with all k predictors and then removes them [12]. Furthermore, the selection of the best model is carried out by taking into account the lowest Akaike Information Criterion (AIC) value for each model. According to the theory of generalized linear regression modeling, relative goodness-of-fit of several models may be compared based on a number of criteria, including the Akaike information criterion [13]. The application of the AIC value itself is a method used mainly in selecting the best regression model with the aim of forecasting (forcasting), which can explain the suitability of the model with existing data (in-sample forecasting) and values that will occur in the future (out of sample casting) [14].
The initial modeling concept used is 1-level linear regression modeling, this aims to see the adequacy of statistics in building level-2.

Modeling Procedure
Suppose i= ( The five models in general can be solved in a linear iterative procedure. The general equation used in the context of GLLMs can be formed in a general matrix structure as follows: is explanatory matrix × is vector of random effect in level-2 is vector of random effect in level-1 The estimation procedure is carried out using the Restricted Iterative Generalized Least Square (RIGLS) iteration procedure to obtain the parameter estimation results using the REML method which is unbiased on variance. The REML estimator in the iteration procedure in obtaining parameters can be done in the following way: is n-th iteration; n=0,1,2, ̂( ) is t-th iteration of fix parameter estimation ̂( ) is t-th iteration of random parameter estimation is diagonal block matrix × * is explanatory matrix × For each model using a forward study design approach, the Akaike Information Criteria (AIC) value will be measured as a reference in determining the best model formed. The model with the lowest AIC value is the model chosen to represent peanut growth. In addition, Interclass Correlation (ICC) is also a concern. In multilevel repeated measurements, where the observations are nested in the corresponding individuals, the intraclass correlation can be interpreted as the dependence of the observations nested in the corresponding individuals. This can be interpreted as the value used to determine how much influence the individual can explain the observations made. Intraclass correlation can explain the proportion of the variable values between observations nested in the specified object [15]. In addition, in this case multilevel regression can be performed if the value of the ICC is above 20%. The relationship between levels in the case of 2-level multilevel regression can be calculated using the following formula: In equation (4) it can be understood that ( ℎ ) is the individual or unit variance at level-2 and ( ℎ )is the variance of each nested observation on the corresponding individual at level-1.

RESULTS AND DISCUSSION
The best model selection method used in this research is forward study design. Therefore, the diwali analysis by modeling each predictor variable is used, starting from a simple linear regression model with one variable to a more complex model structure. The parameter estimation method used is the REML method with an iterative procedure, namely RIGLS. Based on the results of the data analysis carried out, as many as 25 possible model equations were selected to be formed. The results obtained are presented as follows: Model 3 is a model for multiple linear regression equations with two variables. Based on the table, it can be seen that the AIC value is 688,097. The parabolic and cubic regression models are presented in Table  2. As follows: Based on Table 2. it can be seen that the best equation model by taking into account the lowest AIC value is Model 5. Equation 5 is a parabolic or quadratic equation model with an AIC value of 549,694. Based on Table 2. the comparison of the very high spike in model variability occurred in Model 7 with a variance value of 39,801. This is possible because the variable used is the third power of the predictor variable, namely the age of the plant. This results in increased model variability. The estimation of the multilevel linear regression equation in either the random intercept or the random slope model is presented as follows: Based on the parameter estimation results presented in Table 3. It can be seen that the best model equation that can be formed is Model 13 with the lowest AIC value of 616,260. However, it is also necessary to pay attention to another regression model, namely multilevel parabolic regression which is presented as follows: Based on Table 4. it can be seen that the model with the lowest AIC is Model 19 with an AIC value of 428,445. In this case, it was found that there was a significant decrease in the AIC values in other models. However, it is also necessary to pay attention to the estimation of the multilevel cubic regression model as follows: