Comparison of Multiple Linear Regression and Random Forest Regression Models for House Price Prediction in Semarang City Using the CRISP-DM Method
Abstract
The population density in Semarang City is increasing every year. This requires more potential land to build houses to accommodate the denser population. There are various kinds of house prices based on specifications in Semarang City. This requires the right prediction to get the desired house. This study implements and compares the performance of Multiple Linear Regression (MLR) and Random Forest Regression (RFR) models to predict house prices in Semarang City. The method used in this research is CRISP-DM (Cross-Industry Standard Process for Data Mining) as a data mining process. The data used in this research amounted to 9533 data with 8 variables obtained by web scraping. The data will go through a data preprocessing process then training the model. Next is the evaluation stage, which is carried out to measure the performance of the two models using evaluation metrics, namely R-Squared (prediction accuracy), MSE (Mean Squared Error), and RMSE (Root Mean Squared Error). The results of this study show that the MLR model obtained a prediction accuracy 61.1% with a training and testing data division ratio of 75%: 25%. While the RFR model produces a prediction accuracy 78.4% with a training and testing data division ratio of 90%: 10%. This shows that the RFR model is the best performing model. This research successfully applied the RFR model to the streamlit web framework. The final result of this research is a website that can be used by the public to predict house prices according to criteria in Semarang City.
Downloads
Copyright (c) 2024 Pattimura Proceeding: Conference of Science and Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.