DETERMINING STUDENT GRADUATION BASED ON SCHOOL LOCATION USING GEOGRAPHICALLY WEIGHTED LOGISTIC REGRESSION

ABSTRACT


INTRODUCTION
The Faculty of Mathematics and Natural Sciences (FMIPA) is one of the Faculties at Tanjungpura University (UNTAN) which has 9 Undergraduate Programs (S1) namely Mathematics, Physics, Chemistry, Biology, Computer Systems, Geophysics, Marine Science, Statistics, and Information Systems.Undergraduate Program (S1) FMIPA Tanjungpura University has a study load of at least 144 credits with a minimum study period of 3.5 years and a maximum of 7 years (14 semesters).Based on the graduation data of the 2014 batch of FMIPA students, the largest percentage of students who did not complete their studies came from the Computer Systems Department, which was 51.81%.If the study period of students can be predicted early, the relevant Study Program can provide advice/recommendations so that students can graduate in exactly 8 semesters.
Regression analysis is a statistical method that can be used to explain the relationship between dependent variables and independent variables.In general, regression analysis for spatial data is used to analyze data with quantitative (continuous) dependent variables that have a normal distribution.However, in practice, there are often qualitative (categorical) dependent variables in fields such as education, social, economic, and health [1].The logistic regression model is one regression model that can explain the relationship between categorical dependent variables and independent variables [2].
Logistic regression is one of the statistical methods that can be used to find the relationship of dichotomous or binary variables with one or more independent variables that are continuous.However, logistic regression analysis does not consider geographical factors that may affect each observation.Therefore, to see the influence factors of student graduation, an analysis that considers geographical factors is needed because each location of school origin can have different characteristics, such as Geographically Weighted Logistic Regression (GWLR) [3].
GWLR models the relationship between a nominal dependent variable and independent variables that combine the GWR (Geographically Weighted Regression) model and the logistic regression model for the nominal dependent variable [4].In this model, the coefficient value of each regression depends on the location of the observed data.The weights represent the location of the observed data from one another, the weights are used to provide parameter estimation results at different locations.The estimation of parameters at a point will be more influenced by points close to the location than points further away.Therefore, the selection of spatial weights used in estimating parameters is very important.The weight used is a kernel function.There are two types of kernel function weighting: Fixed Kernel and Adaptive Kernel Based on this explanation, the objective of the research is to determine the model for analyzing the factors that influence student graduation based on the school location using GWLR.The result of this study is a model of each school based on variables that have a significant effect on the graduation of FMIPA UNTAN students.

Logistic Regression
The binary logistic regression model is used for data that is binary or dichotomous, namely each observation on the object is grouped as "failure" or "success" which is denoted 1 or 0 so that it follows the Bernoulli distribution as [8].
The function () is a nonlinear function that needs logit transformation to obtain a linear function.Thus the relationship between the dependent variable () and the independent variable () can be seen through logit transformation.The logit form of () is expressed as g()

Multicollinearity Test
Multicollinearity is a condition that indicates a high linear intercorrelation among the explanatory variables in multiple regression models.This condition can lead to inappropriate regression analysis results.Diagnostic tools used to identify multicollinearity include the Variance Inflation Factor (VIF) [9]: VIF indicates how much the variance of an estimator is increased due to the presence of multicollinearity.If there is no collinearity between independent variables, VIF will be 1.The inverse of the VIF is called tolerance and both can be used interchangeably [10].

Heterogenity Test
Testing of the heterogenity spatial assumption is carried out to determine the variance of the residuals of different respond variables in each location or there is one observation location that has different residual variance [11].Moran's Index for Spatial Dependence [12].While, Breusch-Pagan is a statistical test that can detect spatial heterogeneity [13].
where  is a matrix of independent variables of size  × ( + where,   represents a set of independent variables ( = 1, … , ) for the -th location, (  ,   ) is considered as the (, ) coordinates of the -th,   (regression coefficient) was the estimated effect of independent variable  for the -th.
GWLR model parameters can be obtained by estimating using the Maximum Likelihood Estimation (MLE) method [16] and the Fisher Scoring Algorithm [17].The Adaptive Exponential Kernel is used in this study.The adaptive capability causes the Adaptive Kernel function to be adjusted to the condition of the observation points [18].

Evaluating Classification
Apparent Error Rate (APER) is described as the value of the number of observations misclassified by the classification function [19].The error rates for the two groups can be seen in Table 1

Data
The data used in this study are the graduation data of FMIPA UNTAN students' batch 2014 which is secondary data and sourced from BAK (Academic and Student Affairs Bureau) UNTAN.The 2014 batch of students was the last batch to graduate with complete school location data and 445 FMIPA students were used as samples.The variables used in this study are divided into dependent variables (Y), i.e. completed and not completed their studies, and independent variables (X), i.e.Gender, college selection, Accreditation, School Type, School Location, and Department Name.

Multicollinearity Test
Multicollinearity test is conducted on the independent variables to determine whether there is a correlation between the independent variables before using logistic regression and GWLR.Based on the VIF value in Table 2, it can be concluded that there is no multicollinearity between independent variables.The VIF value is less than 10, so all variables can be used in the formation of the GWLR model.

Spatial Heterogenity Test
The spatial heterogeneity test uses the Breusch-Pagan test which is shown in Table 3. , meaning that it can be decided that  0 rejected.This means that the variance between locations is different or heterogeneity occurs.

Parameter Estimation of GWLR Model
Testing the parameters of the GWLR model with Adaptive Exponential Kernel weighting is used to determine the factors that influence the graduation of FMIPA UNTAN students using the Wald test at (  ,   ).Then the logit function is: () = −.  − .  − .  + .  Location ( 1 ,  2 ) and ( 2 ,  2 ) have different parameters that significantly affect the model.This indicates that there is an influence of location on student graduation.The parameter testing process is repeated at each location, from ( 3 ,  3 ) until ( 455 ,  455 ) or until the last observation location.

Evaluating Classification of GWLR Model
The model classification accuracy test is a way to state the feasibility of the model, namely how much the percentage of observations is classified correctly.Model classification can be seen based on the classification results between observations and predictions using Table 1.The classification of the GWLR model with Adaptive Exponential Kernel weights can be seen in Table 6.
[5].Previous research on GWLR has been performed including GWLR Modeling on the Public Health Development Index (HDI) in Papua Province [1], the Use of GWLR models with Adaptive Gaussian Kernel weighting in Poverty Cases in East Nusa Tenggara Province [6], and factors affecting Population Growth Rate (PGR) of Semarang City using logistic regression and GWLR with Bisquare kernel and Gaussian kernel weighting functions [7].

11]. 2.4 Geographically Weighted Logistic Regression (GWLR) Logistic
[15] is a local statistical technique, assuming regression vary spatially across the locations of all the case in the study population[15], it can be expressed by (  ) =   (  ,   ) + ∑   (  ,   ) 1) and  is a vector of observations at   = (   2  ̂) − 1 of size  × 1.The critical area rejects  0 if the value of  >  , 2 or  <  [regression with added geographically weight is called GWLR [8] and should be applied in the case of binary response variables [14].

Table 1 . Classification Error Actual Group Membership Predicted Group Membership
:

Table 6 . Classification Error GWLR Model
The calculation of Apparent Error Rate (APER) and Actual Error Rate (AER) values is as follows