AN ORDINAL LOGISTIC REGRESSION MODEL FOR ANALYZING RISK ZONE STATUS OF COVID-19 SPREAD RISK

. Coronavirus disease 2019 (COVID-19) is a new type of virus that has been found to have infected human since it first appeared in Wuhan, China, in December 2019. This study aims to determine the factors that influence the risk zone status of COVID-19 spread in Indonesia using ordinal logistic regression. The ordinal logistic regression model in this study uses proportional odds model because the researcher assumes probability of predictor variable coefficients is the same for each respond category. The response variable is secondary data from the COVID-19 Handling Task Force, namely the status of the risk zone for the spread of COVID-19 who has 4 categorical levels, namely high, medium, low, and no cases. Predictor variables are elderly population, COVID-19 referral hospital, diabetes mellitus, hypertension, hand washing behavior, male population, and smoking habits. Based on results of the analysis, variables that significantly affect the risk zone status of COVID-19 spread in Indonesia are elderly population and diabetes mellitus. The Odds proportional figure shows that the higher percentage of the elderly population, the higher chance of an area with high-risk zone status (OR=1.171). The higher percentage of comorbidities diabetes mellitus, the higher chance of an area with high-risk zone status (OR=1.569).


INTRODUCTION
Coronavirus disease 2019 (COVID-19) is a new type of virus that has been found to have infected humans since it first appeared in Wuhan, China, in December 2019 [1]. The World Health Organization (WHO) officially declared COVID-19 as a pandemic on March 9, 2020 [2].  is spreading rapidly to various countries including Indonesia. The spread of Coronavirus Disease in the world tends to continue to increase from time to time and causes greater casualties and material losses. Based on data from the COVID- 19 Handling Task Force shows that the number of COVID-19 cases as of December 30, 2020 nationally was 735,124 people with positive cases, 603,741 people with recovered cases and 21,944 people with death cases [3].
In an effort to break the chain of spread of COVID-19, the COVID- 19 Handling Task Force established the status of risk zones for the spread of COVID-19 for each region. The risk zone status is divided into four regional zoning levels based on color, namely red for high risk, orange for moderate risk, yellow for low risk and green for areas with no cases [4]. The determination of the risk zone uses public health indicators which are divided into 10 epidemiological indicators, two public health surveillance indicators and two health service indicators.
Number of research that study factors that influence the spread of COVID-19 have been done, research from Atmojo, et al [5] said that COVID-19 patients who had a smoking habit before the pandemic had the potential to experience 2 times worsening of the situation. This is due to a decrease in lung function and exacerbated by COVID-19 infection. According to Muflihah et al [6], infection prevention and control is an effort to prevent infection to minimize the transmission of infectious diseases such as COVID-19. One of these preventive behaviors is to wash hands. According to Satria, Tutupoho and Chalidyanto [7], elderly COVID-19 patients have a 2,097 times greater risk of dying from COVID-19, male COVID-19 patients have a 1.087 times greater risk of dying than women and COVID-19 with comorbid diabetes has a 4.384 times greater risk of dying from COVID-19 than patients without comorbid diabetes mellitus. According to Gunawan et al [8], hypertension is a comorbid that is often found in COVID-19 patients and can worsen the condition of COVID-19 patients up to 2.5 times. The need for health services and facilities will increase along with the increasing number of cases. Health facilities that need to be considered are the availability of hospitals, bedrooms and PPE [9]. Population mobility is also a factor in the spread of COVID-19 disease caused by the stages of the spread of this disease that have taken place through local transmission [10]. Therefore, it is necessary to conduct an analysis to determine the factors that influence the status of the risk zone for the spread of COVID-19 in Indonesia.
Ordinal logistic regression analysis is a statistical method that describes the relationship between an ordered response variable (Y) and one or more predictor variables (Y) [11]. Study about factors that significantly influence the status of the risk zones for the spread of COVID-19 in districts/cities in Indonesia can be done using ordinal logistic regression. The response variable is a status of the COVID-19 risk zone with four categories, high risk, medium risk, low risk, and no affected/no cases. The predictor variables include a referral hospital of COVID-19, age > 65 years old, the habit of washing hands properly, diabetes comorbidity, hypertension comorbidity, male population and smoking habit. We assumes that the probability of the predictor variable coefficients is the same for each response category, therefore the proportional odds model is used in this study.

RESEARCH METHODS
Ordinal logistic regression will be used to determine the factors that influence the status of the risk zone for the spread of COVID-19 in Indonesia. In this study, data of response and predictor variables of year 2020 obtained from the Covid-19 Handling Task Force  [12] [13], the Central Statistics Agency (BPS) [14], and the Indonesian Health Research and Development Agency (Litbangkes RI) [15]. Based on districts/cities throughout Indonesia is used. However, data on diabetes mellitus, hypertension, hand washing habits, and smoking habits for year 2020 are not available. Therefore, predictions for year 2020 are made using data from year 2013 to 2018 data with constant growth assumption. The data used are 166 districts/cities. The response variable used is the status of the COVID-19 risk zone which consists of 4 levels, namely high risk, medium risk, low risk, and no case/no case. The predictor variables used were COVID-19 referral hospital, age > 65 years, correct hand washing habits, comorbidities with diabetes, comorbidities with hypertension, male population, and smoking habits.
Steps to analysis of the factors that influence the status of the risk zone for the spread of COVID-19 using ordinal logistic regression are: 1) Build a predictive ordinal logistic regression model. In the study using the proportional odds model, 2) Examine the significance of the overall regression parameters using the Likelihood Ratio Test (LRT), 3) Examine the significance of the parameters partially using the Wald test, 4) Define the best ordinal logistic model that have a significant effect with the Wald test, 5) Calculate odds ratios and interprets them, 6) Calculate the estimated probability for each category level, 7) Visualize the category level of each district/city based on the estimated probability in the form of a map, 8) Test the assumption of proportional odds to find out the proportional odds model is feasible to use, 9) Conduct a fitness test model (Goodness of Fit) to find out whether the model used is in accordance with the data.

Ordinal Logistic Regression
Ordinal logistic regression is a statistical method that describes the relationship between a response variable (Y) and more than one predictor variable (X) where the response variable is more than two categories and the measurement scale is level [11]. One of the model approaches used to analyze ordinal logistic regression is the cumulative logit model. Ordinal logistic regression uses the logit model for cumulative probability, namely the cumulative logit model. The cumulative logit model is a model obtained by comparing the cumulative probability, namely the probability of less than or equal to the j-th response category on the predictor variable X is ( ≤ | ). Cumulative probability ( ≤ | ), defined by where j is the categorical level the response variable, = 1,2, . . , − 1. Cumulative logit models are defined by: where j is the categorical level on the response variable, = 1,2, . . , − 1.
The probability of observing the response category j can be written as [16]:

Parameter Significant Test
Likelihood Ratio Test (LRT) is used to test the fitness of the model with the predictor variables as a whole [11]. The test statistic used is the Likelihood Ratio Test (LR). The hypothesis used is 0 : 1 = 2 = ⋯ = = 0 and 1 : ∃ ≠ 0, with k = 1,2,…p. The test statistic used is the Likelihood Ratio Test (LRT).
The rejection criteria for 0 are > 2 ( , )or − < with df = k, k is the number of predictor variables.
Wald test was conducted to examine the effect of each coefficient partially. The test results partially show whether a predictor variable is eligible to be included in the model or not [17]. The hypothesis used is 0 : = 0 and 1 : ≠ 0; with k = 1,2,…,p. The test statistic used is the Wald test.

Assumption of Proportional Odds
An important assumption of proportional odds model is parallel lines, which means that the odds ratio is not affected by where the dependent variable is dichotomized [18]. The proportional odds model is a special model that assumes that the logit of this cumulative probability changes linearly as the predictor variable changes [19]. This assumption states that the categories of predictor variables are parallel to each other.
The hypothesis used is 0 : the model produces the same regression coefficient (slope) and 1 : the model does not produces the same regression coefficient (slope). The test statistics used are : with = 0 ∶ likelihood function with predictor variables that assume parallel lines. 1 ∶ likelihood function with predictor variables that do not assume parallel lines.

Lipsitz Test for Goodness of Fit
Fitness test can be done by comparing the observed value for a subject with the predicted value for that subject [18]. Based on the Hosmer-Lemeshow [20] approach, the Lipsitz test is suitable for the ordinal logistic regression model.
The hypothesis used is 0 : the model is in accordance with the data (there is no difference between the results of the observations and the results of the predictions of the model) and 1 : the model does not match the data (there is a difference between the results of the observations and the results of the predictions of the model). The test statistics used are: The rejection criteria for 0 is > 2 −1 or p-value < .

Odds Ratio
The odds ratio is a measure that estimates how much the predictor variable tends to respond to the response variable. The odds ratio is denoted by OR, defined by the comparison of the odds value for = 1 with the odds value for = 0, and the following equation can be formed [17]: Interpretation of odds ratio (OR): 1. OR> 1 indicates that there is an effect because the value of the probability of success is greater than the value of the event of not being successful. 2. OR=1 indicates that there is no effect because the probability of success is the same as failure. 3. OR < 1 indicates that the probability of success is smaller than the value of the probability of not being successful.   Based on the analysis using the R program, the estimation results and parameter tests is given in Table  4. Table 4 consists of coefficients, standard errors (S.E), Wald test values, and p-value. Overall parameter significance testing is carried out using the likelihood ratio test statistic (LR). The hypothesis used is 0 : 1 = 2 = ⋯ = 7 = 0 and 1 : ∃ ≠ 0. 0 is rejected if −2log (Λ) > 2 ( , ) or − < . In this study, the LR statistic value was obtained 170.18 and p < 0.000 which indicated that there is strong evidence of the importance of referral hospital of COVID-19, age > 65 years old, correct hand washing behavior, diabetes, hypertension, male population, and smoking habits are in the model. To test each predictor variable that has a significant effect on the model, Wald test is carried out. The Wald test is used to test the set of High hypotheses ( 0 : = 0 vs 1 ∶ ≠ 0) for individual regression slope coefficients. The null hypothesis is rejected if | | > /2 or p-value is less than the significance level ( = 0.05). Based Table 2, variable age > 65 years has a Wald value |W| = 2.275 then p-value is 0.003 < 0.05 and the diabetes variable has a Wald |W| = 3.135 then p-value is 0.001 < 0.05. Therefore 0 is rejected and there is strong evidence that variable age > 65 years and diabetes is important to include in the model given the other predictor variables in the model. The Wald test showed that age > 65 years, and diabetes were statistically significant at the significance level 0.05. The estimation results and parameter tests for the best model is given in Table 5. Based on Table 5, the best ordinal logistic regression model is  ( 2 ) and diabetes mellitus ( 4) . To produce an easier interpretation, the odds ratio calculation is carried out. Based on the results of the processing, the odds ratio for the elderly population variable was 1.171. Because OR> 1, it can be interpreted that the increase in status is 1.171 times higher for every increase in the elderly population. This may be because the most vulnerable groups are affected by the COVID-19 disease is elderly. The vulnerability of the elderly during the COVID-19 pandemic is caused by a decrease in body resistance and comorbid diseases in the elderly that can increase the risk of death. Information on the impact of COVID-19 and restrictions on physical social interaction also affect mental health for the elderly, which can lower the immune system. The diabetes variable has an odds ratio of 1.569. Because OR > 1, it can be interpreted that the increase in status is 1.569 times higher for each increase in diabetics. This may be because the decreased immunity of diabetics is one of the risk factors for being infected with COVID-19. From the logistic regression equation, the best guess can be seen for the status for each district/city using probability estimates. The estimated probability of success can be calculated by substituting the predictor values in the logistic regression equation or the best guess. The results of the calculation of the estimated value of 166 districts/cities have been presented on a map of the status of the risk zone for the spread of COVID-19 in Indonesia based on the calculation of the estimated probability value in Figure 2.   From the two tests above, namely the assumption of proportional odds and goodness of fit, it can be concluded that the resulting model fits the data.

RESULTS AND DISCUSSION
This study describes the factors that influence the status of the risk zone for the spread of COVID-19 in Indonesia. The model obtained shows that 2 factors, namely elderly and diabetes co-morbidities, significantly affect the status of the risk zone for the spread of COVID-19. Based on the results of the processing, the odds ratio for the elderly population variable was 1.171. It can be interpreted that the increase in status is 1.171 times higher for district/ city with higher percentage the elderly population. This is in accordance with Indarwati's research [21] which states that the elderly population is the group most vulnerable to being affected by the COVID-19 disease. The vulnerability of the elderly during the COVID-19 pandemic is caused by a decrease in body resistance and comorbid diseases in the elderly that can increase the risk of death. Information on the impact of COVID-19 and restrictions on physical social interaction also affect the mental health of the elderly, which can lower the immune system [21]. The diabetes variable has an odds ratio of 1.569. It means that the increase in status is 1.569 times higher for higher diabetics value. This is in accordance with research conducted by Roeroe et al [22] which states that diabetes is one of the main risk factors for COVID-19. The severity and mortality of COVID-19 is higher in patients with diabetes than in non-diabetics. This is due to the decreased immunity of diabetics, which is one of the risk factors for being infected with COVID-19 [22].

CONCLUSIONS
The ordinal logistic regression model shows that the elderly population and comorbid diabetes have a significant influence on the status of the risk zone for the spread of COVID-19. The increasing percentage of the elderly population and the population suffering from diabetes mellitus in a district/city will increase the chance of the status of the risk zone for the spread of COVID-19 to a higher level in that district/city.