DESIGN OF STUDENT SUCCESS PREDICTION APPLICATION IN ONLINE LEARNING USING FUZZY-KNN

ABSTRACT


INTRODUCTION
The use of technology, information, and communication (ICT) has grown rapidly, including in education [1]. Since the development of internet technology, the learning process that utilizes the internet network has become an alternative method of delivering a learning process. Online learning has made it possible for students to take part in the learning process anywhere and anytime [2]. The advantages of online learning in terms of expanding access to various learning resources have made online learning widely applied in various educational institutions. Online learning, whose implementation allows the separation between lecturers and students and emphasizes student-centered learning [3], has had a different impact on student success rates compared to learning that is fully carried out face-to-face [4]. One of the issues identified in online learning is student engagement in the learning process [5].
Various educational institutions that carry out online learning processes feel the need to develop a learning analytics (LA) system [6]. Learning analytics is a multi-disciplinary approach that uses predictions that generate information based on data processing, improvement of learning technology, educational data mining, and data visualization [7]. Learning analytics is needed in online learning, especially to measure the level of success of students in following a course [8]. This can be accomplished by monitoring and evaluating student online activity, particularly that recorded in the learning management system (LMS). The results of the LA evaluation are expected to be used to predict student success at the end of lectures. Various parties, ranging from education providers, lecturers, and students, have benefited from LA. For education providers, the results of LA can be a recommendation for making policies related to the implementation of education in their institutions. For lecturers, LA results can be used to identify student performance during the learning process [9]. Thus, lecturers can find students who need special attention. Learning analytics can serve as a warning sign for students about their learning progress [10].
In the development of artificial intelligence technology, the development of machine learning models is widely used to predict and classify problems. Research conducted by Kovacic makes early predictions of student success [11]. Using 450 students' data from 2006 to 2009, A profile of the typical successful and unsuccessful students is provided, along with the most crucial elements for student achievement. The research uses classification tree models to make successful and unsuccessful classifications of student data. Another study was conducted by Oyelade et al., who used the k-Means Clustering algorithm to predict students' academic performance [12]. The k-Means Clustering algorithm was used to predict 79 students with nine courses offered. The outcome improves academic planners' ability to judge candidates' performance on a semester-by-semester basis by raising the bar for subsequent academic sessions' future academic outcomes. Furthermore, research conducted by Okubo et al. predicts student performance using Recurrent Neural Network (RNN) [13]. Unlike previous research, this study developed an application based on Fuzzy-KKN to predict and classify student success in an online course. The predictions and classifications carried out are divided into three classes (multiclass), namely, the predictions of students who pass, students who fail, and observers. The model was developed using data from student learning activities in the LMS. This application is expected to be used as one of the devices that complement the LA system operated at educational institutions.

RESEARCH METHODS
This study uses knowledge discovery in databases (KDD) as a research method. KDD is the extraction of potential, implicit, and unknown information from a set of data [14]. In general, the algorithm used in building the model in this study is shown in Figure 1. The total number of students whose data was used was 110. The research began by collecting data on student activities at the LMS. The final grade of the student is used as the target variable to classify student performance. The collected data is divided into training and testing data. The training data is used to form the initial model.

Figure 1. Flowchart of Classification Prediction Model Development
The experiment was carried out with several compositions of training data and testing data, where the training data used activity data before the midterm exam, and the testing data used data after the midterm exam. Equation (1) is used to find membership values in Fuzzy-KNN.
where ( ) is the membership value of the observation in the testing data classified into a class with = {0,1}, , is the membership value of the training data to a class , is the class label contained in the training data with = { , }, is the number of nearest neighbours, is the number of classes in K, and determine the distance weight and the chosen is = 2 [15].
The model used to predict the application is built using Fuzzy-KNN. In order to provide the framework with accurate information for assessing the certainty of the choice, the fuzzy KNN classifier works by giving the unlabeled signature a membership value [16]. This membership value can describe how far the observations in the testing data fall into a classification. The model built is tested for accuracy using data testing. If the model built does not provide optimal results in classifying student graduation, then the improvement process is carried out again using a different composition of training data. Refinement iterations are carried out until a model that gives optimal results is obtained. An application using Python programming is built to simulate the resulting model. A dashboard interface called LeADS (Learning Analytics Dashboard System) was developed to be able to provide a prediction of the percentage of student graduation rates in a course.

RESULTS AND DISCUSSION
The predictions made in this study resulted in three types of categories: pass, fail, and observer. The "pass" category is given to students who meet the minimum threshold of a predetermined passing grade. The "failed" category is given to students who fail to meet the minimum threshold of the predetermined passing grade. The "observer" category is given to students who only access the LMS but have never participated in the course assessment component. Figure 2 shows the results of machine learning modeling predictions that have undergone development for multiclass problems.

Figure 2. Multiclass machine learning prediction results that have been developed
After the modeling is adjusted, the model is simulated again using student data taken from the LMS. The features of the data retrieved are also adjusted for the modeling. The student data features used in this study include the number of accesses to learning files, the number of accesses to discussion forums, the number of accesses to learning videos, quiz grades, and assignment scores. Then the activity log data is downloaded, and data processing is carried out by retrieving the data until mid-semester (before UAS). Figure 3 (a) shows the final results of student activity data. Furthermore, data collection is carried out for value data. Data is taken on the Gradebook menu on the LMS. Figure 3(b) shows the student gradebook used for model making. The gradebook menu data is downloaded, and the data format is converted into Excel format. Furthermore, activity data and value data are combined into one Excel file. The final result data that has been put together is then tidied up by adjusting the feature naming column in the modeling that has been done. Figure 4 is the final result of the data whose feature names have been adjusted. The final data, whose name has been adjusted, is cleaned so that it is ready for use. This study uses student history data and ongoing data. History data is student data from completed courses that are used for machine learning. The history data used is only up to mid-semester data and the final grades of each student. The historical data used will be used as a pattern in conducting analysis using machine learning to calculate student graduation based on ongoing data. Duplicate historical data will be deleted when the modeling is processed. Ongoing data is student data on courses that are being held and student data that will predict graduation. The ongoing data used is also taken until the middle of the semester, and there is no final grade. Students in ongoing data will be predicted to graduate based on historical data. Duplicate ongoing data will not be deleted and will be maintained. Figure 5(a) shows the history data format that is ready for use. Figure  5(b) shows the ongoing data format ready to be used. Historical and ongoing data were previously collected manually by opening the LMS and then processed using Excel. This is, of course, ineffective and inefficient for supporting lecturers who want to check the class they are in charge of because the data retrieval process is difficult and time-consuming, and there is the possibility of errors due to wrong formatting that can cause the modeling to not work. Based on this problem, an initiative was made to make tools that are easier to use for lecturers so that they don't take up time and there are no data format errors that can cause modeling not to work as it should. To make it easier to retrieve historical and ongoing data, a database query is performed on the LMS. The way this query works is simply by reading a valid class ID on the LMS and then pulling feature data according to the modeling format, both historical data and ongoing data. The first step was to create a stand-alone query system to ensure that the results of pulling the data obtained were correct and free of errors. Data withdrawal can be accessed using a simple command, namely, https://idols.ui.ac.id/query/test.php?cid=473. The number 473 in the command can be replaced with a valid class ID on the LMS. If the class ID is valid, then the Excel file will be downloaded automatically according to the modeling format. Figure 6(a) shows an example of a class ID on an LMS. After the initial development is complete, the integration of queries with the LeADS application is carried out. Figure 6(b) shows an example of an LMS view whose class ID is valid. After the Excel data history file is downloaded, the display shown in Figure 7(a) will appear. The file downloaded using the query already has the appropriate modeling format, so there is no need to preprocess the data again. Figure 7(b) shows the ongoing data that has been downloaded using a query that also has the appropriate modeling format. In addition to making queries to make it easier for lecturers, the research team also created a web-based application called LeADS (Learning Analytics Dashboard System). The LeADS application is an embodiment of web-based modeling that has been made using Fuzzy-KNN. The LeADS application can be accessed via the https://ileads.my.id page. The data flow carried out in the LeADS application is shown in Figure 8. In the first step, student history data is uploaded, and the graduation limit is determined. Then the prediction process and calculation of accuracy and recall values are carried out. Next, input the data ongoing. From the modeling results obtained from historical data, predictions are made for ongoing data. To be able to enter the LeADS application, users must first log in by typing their username and password in the fields provided and then clicking Login. Figure 9 shows the initial appearance of the LeADS application. After log in, please click the Get Data History menu to download history data then fill in a valid Class ID to retrieve the history data. The Get Data History menu for the LeADS application is shown in Figure  10(a). After that, to upload the history data that has been downloaded, click on History Participant Data. Click 'add,' then enter the name of the course in the 'name' field, the history data file that has been downloaded in the Participant Data field, and the passing grade limit for the course in the Pass Grade Limit field, then click Save. shows the contents of the added menu data for participants in the history of the LeADS application. If successful, a green notification will appear, and it will appear in the list of recently added data. Click the eye on the right to see the results of the modeling data history. When the history data has been successfully added, a notification will appear, as shown in Figure 11(a). To view the history participant data that has been successfully added, you can click on 'history participant data' as shown in Figure 11 Then click the Get Data Ongoing menu to download the ongoing data, and then fill in a valid class ID to retrieve the ongoing data. The display of historical data modeling results is shown in Figure 12(a). The display of the Get Data on Going menu for the LeADS application is shown in Figure 12 After downloading the ongoing participant data, go to the Ongoing Participant Data menu and upload the data. Click 'add,' then fill in the name of the course in the Name field, select history data from the same course that was previously uploaded in the Data History field and the ongoing participant data file in the Participant Data field, then click Save. Figure 13(a) shows the menu for adding data when going for the LeADS application. To see the ongoing participant data that has been successfully added, you can click on 'ongoing participant data' as shown in Figure 13