ON COMPUTATIONAL BAYESIAN ORDINAL LOGISTIC REGRESSION LINK FUNCTION IN CASES OF CERVICAL CANCER IN TUBAN

. Cervical cancer is the most common cancer that causes death in women. This cancer is mainly caused by Human Papilloma Virus (HPV). It is estimated that 52 million of Indonesian women are at risk of having cancer, and 36% of female cancer patients suffer from cervical cancer. This type of cancer cannot be diagnosed immediately as there is several years of pre-malignancy phase; thus, early detection or screening is needed to prevent it from turning into malignant. Pap test as a screening program can detect cancer, precancer, and normal condition. To understand the predicting factors of the test results, a comprehensive mathematical modelling was created using the link function of Bayesian Ordinal Logistic Regression. This study observed several possible factors that may affect Pap test results in Tuban regency, namely Age (X 1 ), Education (X 2 ), Childbirth Experience (X 3 ), Use of Contraceptives (X 4 ), Menstrual Cycle (X 5 ), Age of First Menstruation (X 6 ), History of Miscarriage (X 7 ), Anemia (X 8 ) and Number of Sexual Partners (X 9 ) . The outcomes indicated that the predicting factors of Pap cervical cancer results are Age (X 1 ), Education (X 2 ), Childbirth Experience (X 3 ), Use of Contraceptives (X 4 ), Menstrual Cycle (X 5 ), and Anemia (X 8 ). In this model, there is an inexplainable error dependency as indicated by the varied constance values of alpha.


INTRODUCTION
Cervical cancer has the highest prevalence among women and ranked second to breast cancer as cause of death [1]. In Indonesia, cervical cancer causes death in women [2]. This disease is primarily caused by Human Papilloma Virus (HPV) [3]. The high mortality rate is resulted by late treatment, where patients only seek for help during the late stage of the disease [4]. There are approximately 52 million women in Indonesia with cancer risk, and 36% of female cancer patients suffer from cervical cancer [5]. This type of cancer can be prevented with early screening using Pap test [6]. Factors influencing Pap test are the use of contraceptives and childbirth [1]. This topic, thus, is interesting to be studied and analyzed further [7].
Logistic regression with link function is a method to create a model by using dichotomous and polychotomous response variable [8]. To create a model of correlation between predictor and response variables, researchers generally use regression method, either the simple or multiple ones [9]. However, in the cases where Gauss-Markov assumptions are violated, the frequently used OLS (Ordinary Least Square) and Maximum Likelihood will fit less, thus, the most fitting method is Bayesian MCMC-Gibbs Sampling [10].
A research conducted by [11] applied logistic link function model in medical science by classifying benign and malignant tissues in breast cancer. The application of this model in medical science takes into account actual death cases. Similar research was carried out by [12] who applied the same model on cervical cancer cases in Sweden by modelling the results of regular screening evaluation performed to control the disease rate in the country. Another similar research is that of [13] , which applied link function of logistic regression on cervical cancer prevalence in China and found that pre-surgery radiotherapy and chemotherapy are independent protecting factors for vascular space invasion and cervical cancer invasion. Research conducted by [1] and [12] use logistic regression analysis with maximum likelihood method, which optimize likelihood function and require the data to follow a certain distribution pattern. In fact, the data distribution in logistic regression are not always represented in clear distribution, thus this study investigates the link function of ordinal logistic regression using Bayesian MCMC computational approach. This method is utilized to obtain estimated parameter of link function logistic model and the factors influencing the results of Pap cervical cancer test. The use of this model is expected to give insight about optimum cervical cancer treatment; thus, it can be included in dissemination of how to treat this disease. It is expected that the result can contribute to reduce cervical cancer prevalence in Tuban Regency, improve scientific knowledge, and inform people about the factors that influence the results of Pap cervical cancer test. Therefore, the Health Department of Tuban can take into account the results of this study in establishing policies of strategic methods for faster cervical cancer treatment.

Bayesian Ordinal Logistic Regression Link Function Model
Link Function of Ordinal Logistic Regression is one of statistical models that is used to analyze response variable with 3 or more categories in ordinal data scale [14]. Ordinal logistic regression is a cumulative-logit model, and its predictor must be categorical and or quantitative [15]. The characteristics of response variable in ordinal-logit model are explained in cumulative probability, a model formed by juxtaposing cumulative probability, a probability of a value being less than or equal to j th response in p predictor variable [16]. The purpose of ordinal logistic regression model is to obtain a fit and simple model to illustrate response variable with a group of predictor variable [15]. Presented below is the link function of ordinal logistic regression model [17] : The values of i  in i =1, 2, ... n (categorical) and k  in k = 1, 2, …p in every ordinal logistic regression is the same [18]. Estimation of each parameter obtained through full conditional distribution in each parameter are i  ,  , and k  by determining prior distribution in the beginning [19]. The prior distribution used is the combination of conjugate and informative priors, as denoted below [20]: Bayesian is an alternative method to estimate model parameter [21]. The availability of program package for Bayesian analysis makes this method more flexible to create a modelling analysis that is stochastically complex [22]. As the results, several limitations in classic modelling, such as the complexity, the assumptions that do not suit the reality, and avoidable simplification, can be overcome [23]. Bayesian modelling is based on posterior model that combines past data as prior information and observation data as likelihood function [24]. Bayesian utilizes the information of sample data and calculates prior distribution [20]. Posterior distribution of the data is obtained by combining initial information used as prior distribution and the sample information as likelihood function [25]. The equation of posterior distribution is as follows [26]: with: Markov Chain Monte Carlo (MCMC) is a numerical approach to determine the posterior distribution using complex simulation method that combines Monte Carlo and Markox Chain characteristics to gain sample data based on specific sampling scenario [27]. Markov chain is a stochastic process { (1) , (2) , … , (K) } that is denoted in the following equation [28] : 1.
Determining the initial value of (0) .

2.
Generating sample by iterating K times to reach convergence 3.
Conducting burn-in process by eliminating samples K times the first sample.

5.
Create a plot for posterior distribution (such as mean, median, standard deviation, and standard error).
Markov Chain Monte Carlo (MCMC) can use several approaches, one of which is Gibbs Sampling method [30]. Gibbs Sampling is a technique to generate random variable from marginal distribution indirectly without having to calculate the density [31]. This technique is based on convergent arrangement of Markov Chain in stationary distribution, that is posterior distribution f( | x)  [32]. The steps of Gibbs Sampling algorithm process are [33] : Determining the initial values of each parameter.
Next, random sequence is produced.
Repeating the second step until convergence is reached.
Test for parameter aims to understand the effect of predictor variable to response variable [34]. A test that employs Bayesian approach is a credible interval test that sets a lower limit at 2,5% and upper limit at 97,5% [24]. The decision criterion is H0 will be rejected if the credible interval does not contain 0 value, concluding that the predictor has a significant influence on response variable. Below are the hypotheses used [27] :

2.
Describing the charcteristics of the patients based on the observed variables by conducting descriptive statistical analysis that consists of descriptive table and cross tabulation.

3.
Adding "add-ins" of ordinal logistic's link function in WinBUGS as a parameter generator for multinomial distribution. Prepare the required input in Univariate Template.odc to add Dagum distribution, which consists of multinomial distribution PDF and log-likelihood function from Dagum distribution and CDF from multinomial distribution. f) Formulating the program codes based on the input in step (e) and put it in a suitable procedure. g) Completing program compilation and validation.

5.
To generate T sample 1 , 2 , … … from posterior distribution ( | ), T is updated as many as the required n times with adequate thin so that the Marcov Chain process can be completed.

6.
Convergent algorithm is defined as a condition where algorithm has reached stationary state in link function of ordinal logistic regression model. If it does not reach stationarity, then more observation needs to be added. Several ways to determine convergence are: a. By looking at small error MC value b. Based on ACF plot, low autocorrelation value indicates fast convergence.

7.
Obtain a summary of posterior distribution (mean, median, standard deviation, MC error, and confidence interval of 95%) in the link function of ordinal logistic regression.

8.
Create and interpret the link function of ordinal logistic regression model.

RESULTS AND DISCUSSION
The formulation of this model aims to understand which predictor variable influences the results of Pap cervical cancer test. the first step was performing descriptive analysis to understand the characteristics of Pap patients in Koesuma Hospital, Tuban. There were 71 patients observed, with 30 (42,3%) of them were confirmed to have cervical cancer, 35 (49,3%) were diagnosed with pre-cancer, and 6 (8,5%) show negative results. Thus, the biggest proportion in Tuban Regency is patients with pre-cancer diagnosis. The other characteristics observed are Age (X1), Education (X2), Childbirth Experience (X3), the Use of Contraceptives (X4), Menstrual Cycle (X5), Age of First Menstruation (X6), History of Miscarriage (X7), Anemia (X8) and Number of Sexual Partners (X 9 ).      Table 3 are considered significant if the interval values between 2,5% to 97,5% do not contain 0. Thus, not all factors in Table 3  Odds ratio is a juxtaposition between individual odds in factor/predictor's (x) condition and the factor/predictor (x) in comparison category. The odds ratios obtained from the data are presented in the following table: Those odds ratios demonstrate that women aged above 50 years old have 1.003 higher risk of getting cervical cancer than those below 50 years old. It also shows that women aged below 50 years old tend to have pre-cancer than normal condition (negative). Women who use contraceptives have 0.936 bigger risk to suffer from this cancer compared to those who do not. The same interpretation applies to all of the variables.
The results of link function of ordinal logistic regression with Bayesian_MCMC method shows that the factors influencing the results of PAP cervical cancer test in Surabaya are Age (X1), Childbirth Experience (X3), Use of Contraceptives (X4), Menstrual Cycle (X5), Age of First Menstruation (X6), and Anemia (X8). This result differs from that of [1]'s study which applied Maximum Likelihood Estimation and suggested that the predicting factors of cervical cancer test results are the use of contraceptives, childbirth experience, menstrual cycle, and history of miscarriage. Based on a research by [4] which uses logistic regression, the predicting factors found in Ambon are age and marriage frequency. Another study was also carried out by [35] in Afrika by applying ordinal regression with maximum likelihood method. The study suggested that the significant factors influencing cervical cancer are surgical stage, age, HIV status, vaginal involvement, and marriage status. Thus, it can be concluded that the high prevalence of cervical cancer is epidemiologically crucial to be taken into account in implementing health programs that focus on regular monitoring through screening and early detection using Pap test. The link function of ordinal logistic regression model is able to yield some information beneficial to be featured in dissemination about cervical cancer management in Tuban Regency.

CONCLUSIONS
The link function of ordinal logistic regression model suggests that the predicting factors of Pap cervical cancer test results are Age (X1), Childbirth Experience (X3), Use of Contraceptives (X4), Menstrual Cycle (X5), Age of First Menstruation (X6), and Anemia (X8). The varied alpha values indicate that there is an error dependency that cannot be explained through this model. The link function of ordinal logistic regression model with Bayesian-MCMC method is able to elaborate any complexity and handle Gauss-Markov assumptions violation.

AKNOWLEDGEMENT
The author expressed his gratitude to DRPM Riset Dikti for the funding support in the implementation of Penelitian Dosen Pemula (PDP) research