APPLICATION OF BAGGING CART IN THE CLASSIFICATION OF ON-TIME GRADUATION OF STUDENTS IN THE STATISTICS STUDY PROGRAM OF

ABSTRACT


INTRODUCTION
An assessment of the success of a university is if the student graduation rate is on time and high every year [1].Students' success in pursuing their education can be seen from the time taken and the predicate of graduation obtained.Undergraduate students of the Faculty of Mathematics and Natural Sciences (FMIPA), Tanjungpura University (Untan) are said to be eligible to graduate on time if they can complete a study period of less than or equal to four years with a minimum learning load of 144 credits.Data mining, or Knowledge Discovery from Data (KDD), is the process of extracting information from data sets and then transforming it into models that are easy to understand [2].Data mining has several techniques, one of which is classification.The decision tree is one of the classification methods that involves the construction of a decision tree.Classification and Regression Tree (CART) is one of the classification tree methods that has been widely used in classification analysis because it is proven to provide a small classification error rate [3] and is easy to interpret [4].
CART belongs to the group of nonparametric statistical methods because no assumptions must be met [5].The instability of the CART tree can be influenced by overfitting the model [6].To improve the performance of CART classification, an ensemble method can be applied, namely bootstrap aggregating (bagging).The bagging method can reduce the error rate in the classification produced by a single CART classification model [3].In the research of Nick Z. Zacharis (2018) examines predictive modeling in blended learning by classifying students into groups of students who successfully passed and students who failed, using the CART technique which obtained very high accuracy results of 99.1%.As in the research of Agwil et al (2020) which discusses the timeliness of student graduation in the S1 Mathematics study program using the CART Bagging method.The application of the CART Bagging method provides an accuracy value of 85.71% higher than the CART method with an accuracy value of 77.3% and resampling is done 50 times.While in the author's research this time using resampling 25 times.
Based on this description, the author analyzed by combining the Bagging method with the CART algorithm.The Bagging CART method aims to reduce the classification error produced by a single classification model (CART) in obtaining an overview of the characteristics and increasing accuracy in predicting the accuracy of student candidates who graduate not and on time graduation of the Statistics Study Program students.

RESEARCH METHODS
Data mining is extracting and transforming information from a data set into an understandable structure or model.Data mining has three main components: clustering or classification, association rules, and sequence analysis.Classification is used to classify each item in a data set into a predefined group [7].One of the techniques in classification is decision trees [8].Decision trees can be used to select the most relevant variables that can be used to form a model.The purpose of a decision tree is to obtain information that is useful in making a decision [9].

Algorithm Classification and Regression Tree (CART)
CART is one of the nonparametric machine learning methods [10].The CART method uses response variables in the form of numeric and categorical data.Response variables in the form of categorical data are called classification trees.In contrast, response variables in the form of numeric data are called regression trees [4].The basic idea of this method is to select predictor variables that have the greatest interaction with the response variable [3].To see the greatest interaction on the response variable is known in the node sorting process based on goodness of split (best sorting criteria) [5].Analysis in the CART method with the following stages: The sorting process starts from the main node, which consists of the data to be sorted [5].Sorting is done to sort the data into two groups of parts: the group that goes to the left node and the right node.The selection process for each parent node is based on the goodness of split (best selection criteria) [11].The goodness of split is formed based on a heterogeneity function that aims to measure the level of heterogeneity of a class from a particular node in the classification tree using the Gini Index () in Equation ( 1

) [5]:
with (|) being the proportion of class j at node t, (|) being the proportion of class k at node t.
The goodness of split is an evaluation of the selection by the s-partitioner at node t.Goodness of split can also be defined as a decrease in heterogeneity.The value of (, ) is used as a test of the goodness of split criterion (best splitting test criteria) [5].Development on the tree by looking for all possible partitions at node  1 so that a partition  * is found that provides the highest value of heterogeneity reduction which can be seen in Equation ( 2

) [12]:
with   is the proportion of the number of objects that belong to   and at   is the left node.The parser that produces higher (, ) is the best parser because it is able to reduce heterogeneity higher.
Class labeling is done from the beginning of node selection until the final node is formed because each node formed can become the final node.The labeling of each end node is based on the rule of the largest number of class members calculated using give an explanation for each symbol: A node t will be the final node or will not be re-sorted if there is only one observation in each child node, all observations in each child node have identical response variable distributions, and there is a limit to the maximum number of tree depths determined [11].
The maximum classification tree that is formed is likely to be very large.The more sorting that is done, the higher the accuracy rate.However, a huge size will make it easier to understand, causing overfitting (complex value matching).This problem can be overcome by pruning the maximum classification tree to obtain an optimal classification tree.The pruning size used to obtain the optimal tree size uses Equation (5) [5]:

Bootstrap Aggregating (Bagging)
Bootstrap aggregating (Bagging) is an ensemble method that stabilizes and improves classification performance [13].Bagging can also be used in some classification and regression methods to reduce the variance of a predictor and thus predict the estimation process [5].There are two stages in the bagging process, namely bootstrapping, which is a sampling of the data owned (resampling), and aggregating is merging many conjecture values into one conjecture [14].The process of making conjectures by bagging using trees is as follows

Aggregating stage
Using the majority vote rule, perform a combined estimation based on the B classification trees.
The use of bagging is especially helpful in overcoming the instability of classification trees and regression trees.In many data clusters, the bagging method can reduce the misclassification rate in classification cases [5].

Classification Accuracy
Apparent Error Rate (APER) is used to express the value of the proportion of samples that are misclassified in the classification process [15].Calculation of classification accuracy to calculate the value of 1-APER will result in a small chance of misclassification.Calculation of classification results with a (, ) = () −   (  ) −   (  ) small chance is said to be the best classification method.APER calculation can be calculated using the confusion matrix [16].The confusion matrix is a method used to measure the performance of a classification method [17].Then the form of the confusion matrix is shown in Table 1.

RESULTS AND DISCUSSION
The data used in this study are primary data in the form of questionnaires and secondary data obtained from the academic FMIPA Untan and PDDikti Untan websites.The number of samples used was 140 data.The response attribute used is the graduation status of Untan Statistics Study Program students from Period I of the 2017/2018 academic year to Period II of the 2022/2023 academic year.

Splitting Node
First, compile all the variables that make up the candidate split.Then calculate the candidate branch value for the probability of each branch   (proportion of the number of objects that belong to   as the left node) and   (proportion of the number of objects that belong to   as the right node).After getting the value   and   , calculate (|) for each left and right branch.Then calculate the goodness of split using Equation (3) as follows: Next, calculate   ,   , and (|) for the left and right branches and the goodness of split similarly for all candidate branches.The highest Goodness of Split is chosen as the branch.The calculation results can be seen in Table 2   Based on Table 2, the results of the calculation of goodness of split suitability show that the highest branch candidate value is 0.042, namely the left branch of High School Accreditation A and the right branch of High School Accreditation other than A, so this branch candidate is chosen to be the root node.Then the other branches will be calculated similarly using the next iteration up to a maximum of four depths.

Class Assignment
The class labeling process on the vertices formed based on the rule of the highest number of class members.If ( 0 |) =   (|), then  0 =  with  is graduated not on time and graduated on time.For example, on node 1, using Equation (4) as follows:

Stop the Splitting
The first maximal classification tree obtained has 4 inner nodes and 6 end nodes.The termination process can be seen in Figure 2. The termination process is at node 9 and node 10.At node 9 there are 65 observations in the same class (homogeneous), and at node 10 there are 75 observations at the maximum depth of 4 depths, so the node sorting process is stopped.

Pruning the Classification Tree
The pruning process begins by taking   (left node) and   (right node) from the maximal tree generated from the parent node t.If nodes and the parent node satisfy the Equation () = (  ) + (  ), then node   and   are pruned.The process is repeated until no more pruning is possible.At this stage, node 0, node 1 and node 2 are used to be pruned, and the results are obtained using Equation (5) as follows: This means that it is fulfilled.Therefore, it is done at node 1 and node 2. At node 0, the first classification tree is pruned.The pruning process is carried out until there is no more possibility of pruning.After the pruning process stops, the optimal classification tree can be seen in Figure 3. Figure 3 is the result of the classification tree pruning process, which is the first optimal classification tree.Graduating on time is influenced by High School Accreditation.

Classification Interpretation on CART
The single classification tree prediction results obtained will be used to test the accuracy of classification using Equation (8) in the CART method first.Based on the prediction results of the CART classification tree, 71 data are predicted to fall into the class not on time, while 27 are predicted to fall into the class on time.The final result of the estimation is based on the most votes, so the final prediction falls into the class not on time.The results of the classification accuracy test on the CART classification tree can be seen in Table 3.Based on the calculation from Table 3, the APER value (misclassified value) is obtained at 30% with an accuracy value of 70%.The accuracy value illustrates that the overall accuracy of the classification produced by the CART model is 70%.

Application of Bagging CART
The bootstrap aggregating (bagging) method is performed to improve the accuracy of the previously known CART.After performing the bootstrap process 25 times to form a classification tree and predict data on each classification tree.Repetition is carried out 25 times, because several times doing repeated sampling results in the same accuracy so that the resampling is used 25 times.The next step is to aggregate predictions based on 25 guesses on the data with majority vote rules.Based on the prediction results from 25 classification trees, 71 data are predicted to fall into the not on time class, while 49 data are predicted to fall into the on time class.The final result of the prediction is based on the majority vote, so the final prediction falls into the class not on time.

Classification Interpretation on Bagging CART
The combined prediction results on CART and bagging will be used to test the accuracy of classification using Equation (8) on the combination of the CART bagging method.The results of the classification accuracy test on the observation classification tree can be seen in Table 4. Based on the calculations from Table 4, the APER value is 14.29% with an accuracy value of 85.71%, and this accuracy value illustrates that the overall classification accuracy generated by the CART Bagging model is 85.71%.

Classification Accuracy on CART and Bagging CART
The results of comparing the classification accuracy value on the initial CART tree with the classification accuracy value after using bagging CART can be seen in Table 5.

Table 5. Classification Accuracy Result
Based on Table 5, it can be seen that the classification accuracy value by applying the bagging technique is 85.71%.Using bagging techniques can increase classification accuracy from 70% in CART to 85.71% in bagging CART.
The Bagging CART method's accuracy value is better than the CART single classification method because the bagging CART method is resampled 25 times, and modeling is carried out.Predictions are made from each sample formed.Representative predictions are selected using the majority vote or choosing the most votes.With the repetition of sampling 25 times, this results in the convergence of the prediction results in Table 5.

CONCLUSIONS
Based on the discussion results described, determining the accuracy value of the graduation classification results of Tanjungpura University Statistics Study Program students obtained a value of 85.71% using the bagging CART method.The bagging CART method can increase classification accuracy from 70% in the initial classification tree to 85.71% in bagging CART.It is better than the classification tree without bagging because it can increase accuracy by 15.71%.

Classification Accuracy Value (%)
[5]: 1. Bootstrapping steps a. Draw a random sample with size recovery from the data cluster.b.Construct the best tree based on the data.c.Repeat steps a-b for B times to obtain B classification trees.

Figure 1 .
Figure 1.Root Node Sorting in The First Classification Tree

Figure 2 .
Figure 2. Node 9 and 10 in The First Maximal Classification Tree

Figure 3 .
Figure 3. Node 0 in The First Pruned Maximal Classification Tree .