HEALTH CLAIM INSURANCE PREDICTION USING SUPPORT VECTOR MACHINE WITH PARTICLE SWARM OPTIMIZATION

ABSTRACT


INTRODUCTION
Insurance is able to reduce or eliminate the losses cost resulted by some risks. Insurance is needed to protect individuals and organizations from risk [1]. Currently, there are many types of insurance on the market, such as health insurance, vehicle insurance, educational insurance, etc. Health insurance is one of the most favorite products in the insurance industry recently. Health insurance is needed to protect insurance customers from various risks of illness to accidents. Health insurance is necessary because coverage helps insurance customers for getting medical care and improving their lives and health [2]. Customers of health insurance can propose an insurance claim for service in medical care. Insurance claim is a request from a policyholder to an insurance company for coverage for a covered loss. Several health insurance companies suffered a loss due to the large number of claims submitted. Predicting the number of claims is an important task in insurance industry. The number of claims is an important factor in determining the profit achievement of health insurance companies [3]. The implication of the claim number prediction is very important for managers, financial experts, and underwriting in the health insurance industry [4]. An increase in the number of claims will directly increase the company's total expenses, thus affecting the profit margins generated by the insurance company. Prediction of the number of claims appropriately and accurately can be used considered to prepare the annual financial budget of the insurance company [5] and to determine the amount of premium that must be paid by insurance users. Therefore, the prediction of claim submission by insurance customers in that year needs to be done by insurance companies.
Claim prediction of health insurance customers can be done by using machine learning methods. Claim prediction can be solved by the classification methods, such as the Naïve Bayes method, DT, ANN, and SVM. The results of previous studies show that the solution of SVM is global and unique [6]. SVM produces good results even if the data information is not complete. It is able to work with unstructured data and solve complex or nonlinear problems by using a kernel function. SVM also works well in high dimensional data [7]. SVM has been implemented successfully in many applications, such as covid-19 prediction [8], leaf diseases detection [9], [10], stock price prediction [11], handwriting classification [12], bank fraud detection [13], credit card fraud detection [14], face recognition [15], and many other applications. However, the SVM performance is determined by the parameters and the features [16]. Parameter selection from SVM is usually done by trial and error so that the performance is less than optimal. For this reason, parameter selection of SVM becomes an important task to obtain the best performance of SVM.
To overcome these problems in SVM, an optimization algorithm is utilized to find the best parameter values [17]. Some heuristic optimization algorithms can be used for optimizing the SVM parameters. Heuristics optimization methods have few or no assumptions about problem that be optimized. Besides that, they have the ability to get the optimal solution or near optimal solution in the large spaces with an acceptable computational cost [18]. Examples of heuristics optimization methods are PSO and GA. Specifically, PSO belongs to the Swarm Intelligence (SI) group. SI has some advantages. They have the ability to obtain the global optimum and don't need the derivatives calculation. Besides that, they are robust, easy to be implemented [19]. One of the SI methods is PSO. All the particles in the PSO are convergent quickly to the global optimal and near-optimal position [20]. Some researches show that PSO is better than GA. For these reasons, this article proposes the health claim insurance prediction using SVM with PSO.

RESEARCH METHODS
This article proposes the health claim insurance prediction using SVM with PSO. The proposed method is divided into several steps, which are shown in the flowchart in Figure 1. These steps are the input of data and parameters of PSO, data pre-processing, split the data, create a model classification for health claim insurance prediction using SVM with PSO, and evaluate the model. The output of the proposed method is a classification model for health claim insurance prediction.

Input data and PSO parameters
This research uses several variables/features for developing the model classification for health claim insurance prediction, which are age, sex, Body Mass Index (BMI), the number of children, smokers, region charges, and insurance claim. For tunning the parameters of SVM, the PSO algorithm is used. The PSO algorithm needs some parameters before the PSO algorithm is run. The PSO has several parameters, which are cognitive and social parameters c1 and c2, and an inertia weight w [21]. This experiment uses several the number of particles in the experiments, which are 5, 10, 20 and 50. This research uses 1000 iterations for the maximum iterations.

Data pre-processing
Data pre-processing is one of the stages in data mining. Before moving on to the next stage of processing, the raw data will be processed first. Data pre-processing is usually done through the elimination of inappropriate data. In addition, in this process, the data will be changed in a form that will be better understood by the system. Another understanding states that data pre-processing is a step to eliminate some of the problems that can interfere when submitting data. This is because a lot of data are in inconsistent format. Data normalization is commonly used for preparing the data before the data mining technique is applied. In machine learning and data mining, this process is used to convert the numerical values in the dataset to be the same scale. Data normalization usually uses a small scale, such as -1 to 1 or 0 to 1. This is generally useful for classification algorithms.
Data normalization techniques in data mining are very helpful because they provide many advantages. The advantages are that the data mining method becomes easier to be applied, faster, more effective, and efficient. The normalized data needed to be done for analyzing by certain methods.

Splitting Data
The dataset is necessary to be divided into training and test sets. This is one of important steps in developing a data mining model. The training process uses most of the data. While the testing process uses a smaller part of the data. The composition of the training data and testing data can vary, generally 80% and 20%, or 70% and 30%. Between subsets of data should not overlap because it will damage the training model process. The machine learning model is obtained by learning the training data. After learning, the performance of model is evaluated by using testing data. The testing process is also used to know the overfitting, underfitting, or appropriate-fitting. This research uses 70% training and 30% testing.

Create Model Classification for Health Claim Insurance Prediction SVM with PSO
The next step is to create a classification model for health claim insurance prediction by using SVM with PSO. The basic principle of SVM is the usage of a linear classifier. The classification cases are able to be solved by separating them linearly. However, SVM has been applied for solving the nonlinear problems. The nonlinear problem can be solved by using the kernel concept. A hyperplane in the high-dimensional space of the SVM method is obtained by maximizing the distance between data classes. The classification model is formed using SVM, where the parameters of SVM, namely c and  are searched using the PSO algorithm. The classification model was created by carrying out the learning process in training. The steps to obtain a classification model from health claim insurance prediction by using SVM with PSO are shown in the flowchart of Figure 2. Particles in SVM with PSO declare candidates from the optimum parameters of SVM, which are c and . In the first step of the SVM method with PSO, a number of candidates s of SVM parameters is generated randomly, which are represented as (0), = 1,2, . . . , . Furthermore, SVM with PSO explores and exploits the search space to obtain the global optimum point. The SVM method with PSO uses the fitness function, which is obtained by calculating the f1 score of training SVM on data training which has formula 1/(0.001 + 1 ). For exploring and exploiting the search space, every particle tries to find the best position ( ). The updating of best position is calculated by Equation (1).

Evaluate the model classification
The next step is to evaluate the model that has been obtained by conducting training on the training data and using the parameters obtained with PSO. Next, the model is tested on the test data. The f1 score, accuracy, recall, and precision in the training and testing data are evaluation metrics which is used to decide the performance classification model.

RESULTS AND DISCUSSION
This part will discuss the dataset for evaluating the SVM method with PSO. The results of the experiment also will be discussed in this section. Furthermore, the results of the experiment will be described, investigated, and discussed. Since the solution produced by the SVM method with PSO may be different in every experiment, the repetition of experiments is necessary. Each experiment in this research is repeated 25 times. The average and deviation standard of f1 score, accuracy, recall, precision, and computational time produced by the methods are used to determine the spread of the values generated by the methods.

Dataset for the Evaluation of Proposed Method
To evaluate the classification model for health insurance claim prediction, it is used 1338 data which were taken from Kaggle.com. The dataset has several attributes, which are age, sex, BMI, the number of children, smokers, region charges, and insurance claim. Table 1 shows the example of dataset. It can be shown that the range of value of each variable is very big different. For this reason, the min max normalization method needs to be applied so that the machine learning methods can work well.

Analysis of the experimental results
This section discusses the experimental results. Furthermore, the experimental results will be analyzed in this section. Table 2 shows the dataset after applying the min max normalization. It can be shown that all data is in the same range [0,1]. This process aims to make the SVM method able to work efficiently. Then, the dataset is separated into two subsets which are training and testing data. The percentage of the number of data used for training and testing is 70% and 30%. Table 3 shows the evaluation of the classification model to predict claims on health insurance users using SVM with PSO. From the experimental results, it can be shown that the performance of the SVM method with PSO is a lot better than the performance of the standard SVM for most of the evaluation metrics. It can be seen from Table 3 that the values of accuracy, precision, and f1 score produced by the SVM method with PSO is higher than the values of accuracy, precision, and f1 score produced by the standard SVM. Although the recall generated by the proposed method is smaller, it is not too different from the standard SVM method. The recall difference is only about 1%. The experimental results also show that the number of particles 5, 10, 20, and 50 produces no different performance. For this reason, the recommended number of particles is 5.     Table 4 shows the standard deviation of the SVM method with PSO in predicting health insurance claims. The resulting standard deviation is very small for all combinations of particle numbers. This means that the SVM method with PSO is able to work well because it produces an optimum value with a small variation. Table 5 shows a comparison of the computational time of the SVM method with PSO and the standard SVM. The computation time of the SVM method with PSO for health insurance claim prediction is greater than the standard SVM method. While Figure 3 shows a graph of the relationship between iterations and the fitness value. PSO converges on a few iterations. The graph shows a convergent PSO of about 4 iterations for the number of particles 5, 10, 20, and 50, so it is recommended to use a small number of iterations so that the computation time can be reduced.

CONCLUSIONS
The experimental results show that the SVM method with PSO gives the great performance in the health claim insurance prediction. The SVM method with PSO for predicting claims on health insurance is superior to standard SVM on 3 evaluation metrics used from 4 evaluation tools, but the computation time required is longer.