COMPARISON OF RESAMPLING EFFICIENCY LEVELS OF JACKKNIFE AND DOUBLE JACKKNIFE IN PATH ANALYSIS

ABSTRACT


INTRODUCTION
Path analysis is an extension of multiple linear regression analysis, which has more than one equation in the form of a system. In path analysis, the terms response and predictor variables are no longer used. Instead, the terms exogenous and endogenous variables are used [1]. As an extension of multiple linear regression analysis, the assumptions in estimating the path coefficient are almost the same as the assumptions that apply in multiple linear regression analysis. The assumptions that must be met in the path analysis are the assumption of linearity, residual variance homoscedasticity, and residual normality assumption [2]. The assumption of normality of the residuals in path analysis is as important as it is in regression analysis. There are times when the residual normality assumption in the path analysis cannot be fulfilled, so it needs to be handled.
If the residual normality assumption cannot be met, then several things can be done, for example, transforming the data, trimming the outlier data, or adding observations [3]. In addition to the method already mentioned, there is one more method that can be used to overcome violations of the residual normality assumption, namely resampling [4].
Resampling is a method of taking repeated samples from the same sample [5]. Bootstrap and Jackknife methods are nonparametric and resampling techniques that aim to estimate standard errors and bias values, Jackknife itself is an alternative to bootstrap [6]. [7] Bootstrap and Jackknife are two methods used to estimate an unknown population distribution with empirical distributions obtained from the resampling process. The resampling used in this study is Jackknife and Double Jackknife resampling.
The basic principle of the Jackknife method is the removal of the first element from the original data [8], the result of deleting the first row is called the first stage Jackknife data as much as 1 resampled again as much as 2 replication so that it is called the second stage double Jackknife. The weakness of the Double Jackknife method is that it takes longer to calculate because it has to calculate as many 1 + 1 2 test statistic value.
The 100 data used in this research is a simulation study using generated data with one exogenous variable, one intervening endogenous variable, and one pure endogenous variable [9]. These three variables are measured directly (observable variables), so they do not require a measurement model. Thus, the data has an interval or ratio scale. Exogenous variables are determined by standardizing ̅ ± 2 , where ̅ = 0 and = 1. The distance between observations on exogenous variables is made the same. Pure endogenous variables are calculated through a linear regression function with three variations of the path coefficient [10]. Path coefficients with a range of 0.05 -0.20 describe a low closeness relationship, a value range of 0.20 -0.50 describe a medium closeness relationship, and a value range of 0.50 -1.00 describes a high closeness relationship.
The comparison of the Jackknife and Double Jackknife methods is said to be good in terms of relative efficiency. Comparative research on the efficiency of the parameter estimator from path analysis by applying Jackknife (delete-5), the Jackknife resampling method is more appropriate for parameter estimation in path analysis, this is indicated by a relative efficiency above one [6]. The use of path analysis with the jackknife method proved to be more effective, and resulted in relatively good asymptotic assumptions [11].
In this study, the effectiveness was tested by applying the Double Jackknife resampling method, where the process will resampling the first stage of resampling data. The novelty of this research is the use of the Double Jackknife method applied to simulated data.
In this study, residuals from generated data are used, namely residuals that follow an exponential distribution [12]. The exponential distribution represents the condition when the residual normality assumption is not met. Comparison of the Efficiency Level of the Parameter Estimator from Path Analysis with Bootstrap and Jack Knife (Delete-5) with Simulation Data shows that the jackknife resampling method (delete-5) is three times more efficient than the bootstrap resampling method [6]. This study applies the resampling method in path analysis. In addition, this research is also intended to determine which resampling method is more efficient, Jackknife or Double Jackknife with a simulation study.

Path Analysis
Path analysis is used when research analyses relationships between complex variables that cannot be done using multiple regression. In complex relationships or more than one dependent variable, a series of regression equations is needed. Path analysis was developed as a method to study the direct and indirect effects of the independent variables on the dependent variable [13].
Path analysis is a technique for analysing cause-and-effect relationships that occur in multiple regression if the independent variables affect the dependent variable not only directly but also indirectly. Another definition says path analysis is the direct development of multiple regression forms to provide estimates of the magnitude and significance of a hypothetical causal relationship in a set of variables. Another definition says path analysis is an extended regression model that is used to test the alignment of the correlation matrix with two or more causal relationship models desired by the researcher [14].
This analysis is a method for explaining and looking for causal relationships between variables. Path analysis is used to examine the relationship between causal models that have been formulated by researchers based on theoretical considerations and certain knowledge. Besides being based on data, causal relationships are also based on knowledge, hypothesis formulation, and logical analysis. This path analysis can be used to test a set of causal hypotheses as well as interpret those relationships [15]. Based on the description above, path analysis is not a method to find causes but an applied method for causal models formulated by researchers with a knowledge base and theoretical considerations.

Types of Influence on Path Analysis
A variable can be viewed as a cause or an effect. This can be seen from the influence of the variables [16]. It explained that there are three types of influence in the path analysis, namely:

1) Direct Effect
Inter-variables are said to directly influence when the influence between exogenous and endogenous variables occurs without going through other variables as intermediaries.

Figure 1. Direct influence
From Figure 1 it can be seen that the magnitude of the direct effect can be known directly as 1 1 .

2) Indirect Effect
Inter-variables have an indirect effect if the influence between exogenous and endogenous variables is through other variables as intermediaries. From Figure 2, it can be seen that the effect of 1 is 3 through 2 . The amount of direct influence is calculated by diverting the direct influence of 1 to 2 and the direct influence of 2 to 3 with the formula 2 1 × 3 2 .

3) Total Effect
The total effect is the sum of the step effect and the indirect effect. Based on Figure 2, the total effect can be calculated using the formula 1 1 + 2 1 × 3 2 .

Path Charts
Path diagrams are an important component of path analysis. Through the path diagram, it can be seen the direct and indirect influence on the relationship between exogenous and endogenous variables. Cause and effect relationships in the path diagram are shown in the direction of the arrows. In path analysis, there is at least one exogenous variable ( ), mediating ( 1 ), and endogenous ( 2 ). The shape of the path diagram with three variables is as follows. Before creating a model, it is necessary to standardize the variables in the path analysis so that they have the same mean and variance. Thus, the path coefficients obtained usually have the same units and can be compared. The following is a form of standardization carried out on exogenous variables.
By carrying out standardization, each variable will follow the standard normal distribution, namely the normal distribution with = 0 and = 1. To get the original observed value, the reverse transformation can be carried out as follows.
The path analysis model is a system of equations. This model can be formed based on the path diagram. It is necessary to solve the system of equations simultaneously (simultaneously), starting from parameter estimation and hypothesis testing to interpretation.
The system of equations obtained from the path diagram in Figure 3 is as follows.
In the equation above, moves from 1,2, ⋯ , , where n indicates the number of observations. After being standardized with Equation (1), the system of Equations (3-4) becomes as follows.
In matrix form, the system of Equation (5) can be written as follows.

Path Coefficient Estimation
The path coefficient shows the magnitude of the direct influence of exogenous variables on endogenous variables in a system of equations. One method that can be used to estimate the path coefficient is Ordinary Least Square (OLS).
The principle of the OLS method is to minimize the sum of the residual squares. Based on Equation (6), = Χβ + ε can be written as ε = − Χβ. Thus, the sum of the squares of the residuals can be written as = .
The OLS method minimizes the following functions.

Resampling
Resampling is the process of repeating sampling from an existing or original sample so that a new sample is obtained [17]. A new sample is obtained from the original sized sample taken at random, either with replacement or without replacement. The resampling method can be applied as an alternative if the number of observations does not meet the needs of the research, which can lead to inaccurate parameter estimates [18]. In addition, the application of the resampling method allows the validity of the data, which is free from assumptions or, in other words, does not require normality assumptions.

Jackknife
Jackknife is a resampling method introduced by Quenouille to estimate bias. Tukey introduced Jackknife to estimate the standard deviation [19]. The Jackknife method is used for taking new samples repeatedly from the original data of size by deleting the − observation with = 1,2,3, … , . then the Jackknife simulation is based on the new data set × * = 1 * , 2 * , ⋯ , * which is used to take new samples repeatedly from the original data of size by removing the − ℎ observation. Jackknife's resampling process in general can be seen as shown below in Figure 4. Based on this process, the collection of observational data (observation) is based on the removal of one sample or group of samples from the initial sample, which is considered as a population. At one stage and the next, the removed sample is returned and one or a group of samples is deleted and so on, until all samples from the population have had a chance to be deleted.
The principle is to remove one piece of data and repeat it as many times as there are samples. To estimate the regression parameters using the Jackknife procedure, eliminating one piece of data can be done using the following procedure.

Double Jackknife
The double Jackknife procedure is done by generating new data from the previously generated Jackknife data set. From the first stage Jackknife data set that was replicated as much as 1 from the original data set, the Jackknifing process was repeated as many as 2 replications, so that the total number of test statistics that had to be calculated as 1 + 1 2 . Remove one by one sample from n observations and estimate the path coefficient ̂( ) where the value ̂( ) is the Jackknife path coefficient. After eliminating the ith observation from − , the estimated value of the Jackknife path coefficient ̂( 1) ,̂( 2) , … ,̂( ) . 3) Furthermore, the Double Jackknife procedure is carried out by regenerating data from the previously generated Jackknife data set. From the first phase of the Jackknife data set that was replicated as much as 1 from the original data set, the Jackknife process was carried out again as many as 2 replications, so that the total number of test statistics that had to be calculated as 1 + 1 2 . 4) The Jackknife path coefficient is calculated which is the average of ̂( 1) ,̂( 2) , … ,̂( ) . 5) Then calculate the level of accuracy of parameter estimation using bias and standard deviation.
The following are the steps that must be taken to estimate the standard error in a double jackknife:

Relative Efficiency
The comparison of the resampling method is measured based on the relative efficiency value [20]. Relative efficiency is calculated by comparing the variance between the two parameter estimators. The relative efficiency of the two estimators can be written as follows.
Description: (̂,̂ ) = Efficiency between Jackknife and Double jackknife resampling method estimator (̂) = Variant of parameter estimator with the Jackknife resampling method (̂) = Variant of parameter estimator with Double Jackknife resampling method Efficiency comparison between the Jackknife and Double Jackknife resampling estimator variants. That is a parameter estimator variant with the Double Jackknife resampling method. If the efficiency of the results using Equation (17) is more than 1, the ̂ an estimator is said to be more efficient than the ̂ estimator. On the other hand, if the calculated efficiency results are less than 1, the ̂ an estimator is said to be more efficient than the ̂. estimator. Meanwhile, if the efficiency results are equal to 1, both estimators are equally efficient.

The Degree of Closeness of the Relationship between Categorical Variables Is Low
The level of closeness of the relationship between variables included in the low category is shown by the path coefficient values in the range of 0.05-0.20. Resampling was carried out on the results of parameter simulations in path analysis for each set of samples, the path coefficient was estimated, denoted as ̂= (̂1 ̂2 ̂1 2 ). After obtaining three path coefficient estimators, namely ̂1 , ̂2 , and ̂1 2 then the average path coefficient estimator is calculated, which is denoted as ̂1 * (. ) ̂2 * (. ), and ̂1 2 * (. ). The hypothesis test is presented in Table 1. Table 1. Hypothesis testing on conditions assuming normality not fulfilled and low closeness of Jackknife and Double Jackknife resampling Based on Table 1, the -value for the path coefficient, which shows the relationship between to 1 , to 2 and 1 to 2 is smaller than the level set, so that 0 is rejected. Thus, it can be concluded that has a significant effect on 1 , has a significant effect on 2 , and 1 has a significant effect on 2 . Judging from the relative efficiency value of variable to 1 in the path analysis, the efficiency value is more than one, indicating that the Jackknife method has a smaller variant than the Double Jackknife method. Whereas in the path analysis between variables to 2 and 1 to 2 the efficiency value is less than one indicating that the Double Jackknife method has a smaller variant than the Jackknife method.

The Level of Closeness of the Relationship between the Variables in the Moderate Category
The level of closeness of the relationship between variables included in the medium category is shown by the path coefficient values in the range of 0.20-0.50. Resampling was carried out on the results of parameter simulations in path analysis for each set of samples, the path coefficient was estimated, denoted as ̂= (̂1 ̂2 ̂1 2 ). After obtaining three path coefficient estimators, namely ̂1 , ̂2 and ̂1 2 then the average path coefficient estimator is calculated which is denoted as ̂1 * (. ) ̂2 * (. ), and ̂1 2 * (. ). The hypothesis test is presented in Table 2. Based on Table 2, the -value for the path coefficient which shows the relationship between to 1 , to 2 and 1 to 2 is smaller than the level set so that 0 is rejected. Thus it can be concluded that has a significant effect on 1 , has a significant effect on 2 , and 1 has a significant effect on 2 . Judging from the relative efficiency values of all path analyses efficiency values are less than indicating that the Double Jackknife method has a smaller variant than the Jackknife method. Therefore, it can be concluded that path analysis with Double Jackknife resampling is more efficient than the Jackknife resampling method.

The Degree of Closeness of the Relationship between Category Variables Is High
The level of closeness of the relationship between variables included in the high category is shown by the path coefficient values in the range of 0.05-1.00. Resampling was carried out on the results of parameter simulations in path analysis for each set of samples, the path coefficient was estimated, denoted as ̂= (̂1 ̂2 ̂1 2 ). After obtaining three path coefficient estimators, namely ̂1 , ̂2 and ̂1 2 then the average path coefficient estimator is calculated which is denoted as ̂1 * (. ), ̂2 * (. ), and ̂1 2 * (. ). The hypothesis test is presented in Table 3. Table 3. Hypothesis testing on unfulfilled normality assumption conditions and high relationship closeness of Resampling Jackknife and Double Jackknife Based on Table 3, the -value for the path coefficient, which shows the relationship between to 1 , to 2 , and 1 to 2 , is smaller than the level set so that 0 is rejected. Thus, it can be concluded that has a significant effect on 1 , has a significant effect on 2 , and 1 has a significant effect on 2 . Judging from the relative efficiency value of variable to 1 , in path analysis, the efficiency value is less than one, indicating that the Double Jackknife method has a smaller variant than the Jackknife method, while in the path analysis between to 2 and 1 to 1 , the efficiency value is more than one, indicating that the method Jackknife has a smaller variant than the Double Jackknife method.

CONCLUSIONS
Based on the simulation studies that have been carried out, the use of the Jackknife and Double Jackknife resampling methods on the data with the assumption of residual normality is not met, indicating that both the Jackknife and Double Jackknife resampling methods can be applied and overcome normality. The calculated relative efficiency produces various levels of the closeness of the relationship between variables indicating that the Double Jackknife resampling method is more efficient than the Jackknife resampling method.