2025, issue 4, p. 37-46

Received 27.06.2025; Revised 03.07.2025; Accepted 18.11.2025

Published 08.12.2025; First Online 15.12.2025

https://doi.org/10.34229/2707-451X.25.4.4

Previous | FULL TEXT (in Ukrainian) | Next

MSC 68T50, 68T20

Cardiovascular Disease Risk Prediction Models

Margaryta Prazdnikova

V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine, Kyiv

Correspondence: This email address is being protected from spambots. You need JavaScript enabled to view it.

Introduction. Non-communicable diseases, especially cardiovascular pathologies, remain the leading cause of mortality worldwide, creating a significant burden on society, the economy, and healthcare systems. Heart attacks and strokes are particularly dangerous because they often develop suddenly and without symptoms, which complicates timely diagnosis and prevention. Identification of patients at increased risk can improve disease prevention and clinical outcomes, enhance the quality of medical care. In recent years, growing attention has been directed toward the use of artificial intelligence, machine learning, and big data processing techniques – particularly the analysis of unstructured medical texts – to improve the accuracy of medical predictions. The analysis of medical reports, patient histories, and other textual information can reveal hidden patterns that are inaccessible to traditional manual review and can greatly contribute to personalized treatment strategies.

The aim of the study is to improve the model for predicting the risk of myocardial infarction by introducing new methods of preprocessing medical reports and feature selection. In addition, the study aims to develop a new model for determining the risk level of cerebral vascular damage. The work focuses on integrating these models into modern information systems used in medical institutions and testing them on real clinical datasets.

Results. The study proposed and evaluated several approaches for improving myocardial infarction risk prediction, including text translation, lemmatization, and automated extraction of medical terms. Building on an extended version of the existing methodology, a new model was developed to predict cerebral vascular lesions. The analysis was conducted using the depersonalized “Eskulap” database, which contains records of more than 22,000 patients. The improved models demonstrated strong performance, achieving 80% accuracy (AUC = 0.898) for myocardial infarction and 86% accuracy (AUC = 0.92) for cerebral vascular lesions. The new model has already been successfully implemented in a medical center.

Conclusions. The proposed methods for improving the analysis of medical texts, including preprocessing, automated selection of relevant features, lemmatization, and adaptation to language-specific characteristics – enhanced the quality of risk prediction for cardiovascular and cerebrovascular diseases. The development of the new model for predicting cerebral vascular lesions further confirmed the effectiveness of this approach, and its implementation demonstrates the feasibility of integrating such solutions into clinical, insurance, and scientific practice. The model supports personalized prevention and treatment, facilitates the identification of high-risk groups, optimizes resource allocation, and improves clinical decision-making. It may also be used for calculating insurance rates or guiding targeted funding by governmental and municipal institutions.

The model also has strong potential for further development through the integration of additional data sources (such as laboratory indicators, instrumental examination results, and medical images), the adoption of more advanced ensemble algorithms, and deeper incorporation of expert assessments. Taken together, these results reinforce the conclusion that machine learning is a promising tool for analyzing unstructured medical texts, supporting clinical decision-making, and improving overall healthcare efficiency.

Keywords: non-communicable diseases, myocardial infarction, stroke, machine learning, risk prediction, Multinomial Naive Bayes, medical texts, data analysis.

Cite as: Prazdnikova M. Cardiovascular Disease Risk Prediction Models. Cybernetics and Computer Technologies. 2025. 4. P. 37–46. (in Ukrainian) https://doi.org/10.34229/2707-451X.25.4.4

References

1. Stroke is not always a hemorrhage: what types of this dangerous disease exist. 2023. https://phc.org.ua/news/insult-ne-zavzhdi-krovoviliv-yaki-e-riznovidi-nebezpechnogo-zakhvoryuvannya (in Ukrainian) (accessed: 15.05.2025)

2. Prazdnikova M.O. Prognosis and risk assessment of myocardial infarction based on a set of medical report texts. Cybernetics and Computer Technologies. 2024. 3. P. 71–80. (in Ukrainian) https://doi.org/10.34229/2707-451X

3. Making a difference - using 'big data' to shape patient care. https://www.uclhospitals.brc.nihr.ac.uk/making-difference-using-big-data-shape-patient-care (accessed: 15.05.2025)

4. Singh M., Kumar A, etc. Artificial intelligence for cardiovascular disease risk assessment in personalised framework: a scoping review. 2024. 73 (3). https://doi.org/10.1016/j.eclinm.2024.102660

5. Shishehbori F., Awan Z. Enhancing Cardiovascular Disease Risk Prediction with Machine Learning Models. arXivLabs. 2024. https://doi.org/10.48550/arXiv.2401.17328

6. Rehman M.U., Naseem S., Butt A. et al. Predicting coronary heart disease with advanced machine learning classifiers for improved cardiovascular risk assessment. Sci Rep, 15, 13361. 2025. https://doi.org/10.1038/s41598-025-96437-1

7. Liu M., Liu Y., Liu J. Machine Learning for Infectious Disease Risk Prediction: A Survey. ACM Computing Surveys. 2024. 57 (8). P. 1–39. https://doi.org/10.1145/3719663

8. Bayes N. https://scikit-learn.org/stable/modules/naive_bayes.html (accessed: 15.05.2025)

9. Navarro G. A guided tour to approximate string matching. ACM Computing Surveys (CSUR). 2001. 33 (1). P. 31–88. https://doi.org/10.1145/375360.375365

10. spaCy: Industrial-strength NLP for real-world applications. https://products.documentprocessing.com/uk/parser/python/spacy/#google_vignette (in Ukrainian) (accessed: 15.05.2025)

11. Becht E., Dutertre C.-A., Kwok I.W.H., Ng L.G., Ginhoux F., Newell E.W. Evaluation of UMAP as an alternative to t-SNE for single-cell data. The preprint server of biology. April 10, 2018. https://doi.org/10.1101/298430

ISSN 2707-451X (Online)

ISSN 2707-4501 (Print)

Previous | FULL TEXT (in Ukrainian) | Next

2025, issue 4, p. 37-46

Archive