EXTRACTIVE CLINICAL NOTES SUMMARIZATION USING SINGLE MACHINE LEARNING, ENSEMBLE, AND STACKING APPROACHES
Abstract
Summarizing clinical notes is pivotal to supporting medical decision-making by presenting relevant information concisely and efficiently. However, the complexity of clinical language, the unstructured nature of the text, and the inherent class imbalance pose major challenges for the development of automatic summarization systems. This study develops a framework for extractive clinical notes summarization and compares the performance of single-model machine learning, simple ensembles, and stacking. A synthetic dataset comprising 2,000 clinical notes was segmented into 22,000 sentences, each labeled as important or not important according to a reference extractive summary. The methodology includes text preprocessing (normalization, expansion of medical abbreviations, tokenization, and stopword removal), feature extraction (TF-IDF, Named Entity Recognition, and structural features), and implementation of multiple models. Evaluation relies on Accuracy, Precision, Recall, and F1-score, complemented by Entity-F1, redundancy analysis, and latency per document. Experimental results show that the best single model, XGBoost, achieves an F1-score of 0.76, reflecting its ability to capture non-linear interactions among heterogeneous clinical text features under class imbalance, while simple ensembles further improve performance to 0.78. The most substantial gains are obtained with stacking, which reaches an F1-score of 0.80, precision of 0.83, and recall of 0.78. The confusion matrix indicates low false negatives, and the Precision–Recall curve (AP = 0.73) demonstrates consistent behavior under imbalanced data conditions. Overall, the findings establish stacking as the most effective approach for extractive summarization of clinical notes. Beyond theoretical relevance, the results carry practical implications for developing clinical decision support systems that are safe, efficient, and readily integrable into digital health services.
Downloads
References
D. Keszthelyi, C. Gaudet-Blavignac, M. Bjelogrlic, and C. Lovis, “PATIENT INFORMATION SUMMARIZATION IN CLINICAL SETTINGS: SCOPING REVIEW.,” JMIR medical informatics, vol. 11, p. e44639, Nov. 2023, doi: https://doi.org/10.2196/44639.
H. Nguyen, H. Chen, L. Pobbathi, and J. Ding, “A COMPARATIVE STUDY OF QUALITY EVALUATION METHODS FOR TEXT SUMMARIZATION,” 2024.
G. Adams, J. Zucker, and N. Elhadad, “A META-EVALUATION OF FAITHFULNESS METRICS FOR LONG-FORM HOSPITAL-COURSE SUMMARIZATION.,” Proceedings of machine learning research, vol. 219, pp. 2–30, Aug. 2023.
F. Ladhak, E. Durmus, H. He, C. Cardie, and K. McKeown, “FAITHFUL OR EXTRACTIVE? ON MITIGATING THE FAITHFULNESS-ABSTRACTIVENESS TRADE-OFF IN ABSTRACTIVE SUMMARIZATION,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds., Dublin, Ireland: Association for Computational Linguistics, pp. 1410–1421, May 2022, doi: https://doi.org/10.18653/v1/2022.acl-long.100.
X. Luo, Z. Deng, B. Yang, and M. Y. Luo, “PRE-TRAINED LANGUAGE MODELS IN MEDICINE: A SURVEY,” Artif. Intell. Med., vol. 154, p. 102904, 2024, doi: https://doi.org/10.1016/j.artmed.2024.102904.
K. Sahit Reddy, N. Ragavenderan, K. Vasanth, G. N. Naik, V. Prabhu, and G. S. Nagaraja, “MEDICALBERT: ENHANCING BIOMEDICAL NATURAL LANGUAGE PROCESSING USING PRETRAINED BERT-BASED MODEL,” IAES International Journal of Artificial Intelligence, vol. 14, no. 3, pp. 2367–2378, Jun. 2025, doi: https://doi.org/10.11591/ijai.v14.i3.pp2367-2378
T. G. Dietterich, “ENSEMBLE METHODS IN MACHINE LEARNING,” in Multiple Classifier Systems, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 1–15, 2000, doi: https://doi.org/10.1007/3-540-45014-9_1
Supriyono, A. P. Wibawa, Suyono, and F. Kurniawan, “A SURVEY OF TEXT SUMMARIZATION: TECHNIQUES, EVALUATION AND CHALLENGES,” Natural Language Processing Journal, vol. 7, p. 100070, 2024, doi: https://doi.org/10.1016/j.nlp.2024.100070
D. H. Wolpert, “STACKED GENERALIZATION,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992, doi: https://doi.org/10.1016/S0893-6080(05)80023-1
G. V Cormack, C. L. A. Clarke, and S. Buettcher, “RECIPROCAL RANK FUSION OUTPERFORMS CONDORCET AND INDIVIDUAL RANK LEARNING METHODS,” in Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, in SIGIR ’09. New York, NY, USA: Association for Computing Machinery, 2009, pp. 758–759. doi: https://doi.org/10.1145/1571941.1572114
J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “ON FAITHFULNESS AND FACTUALITY IN ABSTRACTIVE SUMMARIZATION,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, D. Jurafsky, J. Chai, N. Schluter, and J. Tetreault, Eds., Online: Association for Computational Linguistics, Jul. 2020, pp. 1906–1919, doi: https://doi.org/10.18653/v1/2020.acl-main.173
S. V Shah, “ACCURACY, CONSISTENCY, AND HALLUCINATION OF LARGE LANGUAGE MODELS WHEN ANALYZING UNSTRUCTURED CLINICAL NOTES IN ELECTRONIC MEDICAL RECORDS,” JAMA Network Open, vol. 7, no. 8, pp. e2425953–e2425953, 2024, doi: https://doi.org/10.1001/jamanetworkopen.2024.25953
E. Asgari et al., “A FRAMEWORK TO ASSESS CLINICAL SAFETY AND HALLUCINATION RATES OF LLMS FOR MEDICAL TEXT SUMMARISATION,” npj Digital Medicine, vol. 8, no. 1, p. 274, 2025, doi: https://doi.org/10.1038/s41746-025-01670-7
D. Van Veen et al., “CLINICAL TEXT SUMMARIZATION: ADAPTING LARGE LANGUAGE MODELS CAN OUTPERFORM HUMAN EXPERTS.,” Research square, Oct. 2023, doi: https://doi.org/10.21203/rs.3.rs-3483777/v1
A. E. W. Johnson et al., “MIMIC-III, A FREELY ACCESSIBLE CRITICAL CARE DATABASE,” Scientific Data, vol. 3, no. 1, p. 160035, 2016, doi: https://doi.org/10.1038/sdata.2016.35
T. Saito and M. Rehmsmeier, “THE PRECISION-RECALL PLOT IS MORE INFORMATIVE THAN THE ROC PLOT WHEN EVALUATING BINARY CLASSIFIERS ON IMBALANCED DATASETS,” PLOS ONE, vol. 10, no. 3, pp. 1–21, 2015, doi: https://doi.org/10.1371/journal.pone.0118432
J. Junadhi, A. Agustin, L. Efrizoni, F. Okmayura, H. Rahman, Dedi, and Muslim, “IMPROVING EVALUATION METRICS FOR TEXT SUMMARIZATION: A COMPARATIVE STUDY AND PROPOSAL OF A NOVEL METRIC,” Journal of Applied Data Sciences, vol. 6, no. 2, pp. 885–896, May 2025, doi: https://doi.org/10.47738/jads.v6i2.547
A. Ghasemieh, A. Lloyed, P. Bahrami, P. Vajar, and R. Kashef, “A NOVEL MACHINE LEARNING MODEL WITH STACKING ENSEMBLE LEARNER FOR PREDICTING EMERGENCY READMISSION OF HEART-DISEASE PATIENTS,” Decision Analytics Journal, vol. 7, no. May, p. 100242, 2023, doi: https://doi.org/10.1016/j.dajour.2023.100242
A. Ghasemieh et al., “MACHINE LEARNING-BASED STACKING ENSEMBLE MODEL FOR PREDICTION OF HEART DISEASE WITH EXPLAINABLE AI AND K-FOLD CROSS-VALIDATION : A SYMMETRIC APPROACH,” Decision Analytics Journal, vol. 7, no. Cvd, pp. 1–26, 2023, doi: https://doi.org/10.1016/j.dajour.2023.100242
A. Suszek and S. Guze, “A LOGISTIC REGRESSION MODEL FOR THE ANALYSIS OF ATTITUDES AND BEHAVIOURS TOWARDS FUNCTIONAL FOODS AMONG SENIOR CONSUMERS AGED 60 + YEARS,” pp. 1–21, 2024, doi: https://doi.org/10.3390/su162411015
S. N. Himawan, A. Suheryadi, K. A. Cahyanto, F. Sitanggang, and K. A. Pamungkas, “COMPARATIVE ANALYSIS OF TEXTURE BASED AND GEOMETRIC FEATURE EXTRACTION TECHNIQUES FOR FACIAL PARALYSIS CLASSIFICATION,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 7, no. 2, pp. 341–351, 2025, doi: https://doi.org/10.35882/jeeemi.v7i2.645
Damayanti, F. R. Lumbanraja, A. Junaidi, Sutyarso, G. N. Susanto, and D. A. Megawaty, “A NEW FEATURE EXTRACTION APPROACH IN CLASSIFICATION FOR IMPROVING THE ACCURACY OF PROTEINS,” International Journal on Informatics Visualization, vol. 9, no. 1, pp. 359–364, 2025, doi: https://doi.org/10.62527/joiv.9.1.2589
E. C. Zabor, C. A. Reddy, R. D. Tendulkar, and S. Patil, “LOGISTIC REGRESSION IN CLINICAL STUDIES,” International Journal of Radiation Oncology*Biology*Physics, vol. 112, no. 2, pp. 271–277, 2022, doi: https://doi.org/10.1016/j.ijrobp.2021.08.007
H. Sawiji et al., “LOGISTIC REGRESSION ANALYSIS: PREDICTING THE EFFECT OF CRITICAL THINKING AND EXPERIENCE ACTIVE LEARNING MODELS ON ACADEMIC PERFORMANCE,” Başlık, vol. volume-13-2024, no. volume-13-issue-2-april-2024, pp. 719–734, 2024, doi: https://doi.org/10.12973/eu-jer.13.2.719
O. Peretz, M. Koren, and O. Koren, “NAIVE BAYES CLASSIFIER – AN ENSEMBLE PROCEDURE FOR RECALL AND PRECISION ENRICHMENT,” Engineering Applications of Artificial Intelligence, vol. 136, p. 108972, 2024, doi: https://doi.org/10.1016/j.engappai.2024.108972
A. B. Wiratman and Wella, “PERSONALIZED LEARNING MODELS USING DECISION TREE AND RANDOM FOREST ALGORITHMS IN TELECOMMUNICATION COMPANY,” International Journal on Informatics Visualization, vol. 8, no. 1, pp. 318–325, 2024, doi: https://doi.org/10.62527/joiv.8.1.1905
D. Borup, B. J. Christensen, N. S. Mühlbach, and M. S. Nielsen, “TARGETING PREDICTORS IN RANDOM FOREST REGRESSION,” International Journal of Forecasting, vol. 39, no. 2, pp. 841–868, 2023, doi: https://doi.org/10.1016/j.ijforecast.2022.02.010
D. Kazolis, J. Fantidis, and C. D. Fotakis, “DEVELOPMENT OF A MACHINE LEARNING ALGORITHM FOR PREDICTING ELECTRICAL CONSUMPTION,” Engineering Proceedings, vol. 104, no. 1, 2025, doi: https://doi.org/10.3390/engproc2025104055
P. Thiengburanathum and P. Charoenkwan, “SETAR: STACKING ENSEMBLE LEARNING FOR THAI SENTIMENT ANALYSIS USING ROBERTA AND HYBRID FEATURE REPRESENTATION,” IEEE Access, vol. 11, no. September, pp. 92822–92837, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3308951
A. U. Berliana and A. Bustamam, “IMPLEMENTATION OF STACKING ENSEMBLE LEARNING FOR CLASSIFICATION OF COVID-19 USING IMAGE DATASET CT SCAN AND LUNG X-RAY,” 2020 3rd International Conference on Information and Communications Technology, ICOIACT 2020, pp. 148–152, 2020, doi: https://doi.org/10.1109/ICOIACT50329.2020.9332112
R. Gupta, T. A. Krishna, and M. Adeeb, “COUGH SOUND BASED COVID-19 DETECTION WITH STACKED ENSEMBLE MODEL,” Proceedings - 4th International Conference on Smart Systems and Inventive Technology, ICSSIT 2022, pp. 1391–1395, 2022, doi: https://doi.org/10.1109/ICSSIT53264.2022.9716373
M. Alabdulhafith et al., “A CLINICAL DECISION SUPPORT SYSTEM FOR EDGE/CLOUD ICU READMISSION MODEL BASED ON PARTICLE SWARM OPTIMIZATION, ENSEMBLE MACHINE LEARNING, AND EXPLAINABLE ARTIFICIAL INTELLIGENCE,” IEEE Access, vol. 11, no. September, pp. 100604–100621, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3312343
Copyright (c) 2026 Junadhi Junadhi, Agustin Agustin, Deshinta Arrova Dewi, Abhishek Saxena

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.




1.gif)


