2021, issue 2, p. 76-84
Received 16.04.2021; Revised 14.06.2021; Accepted 24.06.2021
Published 30.06.2021; First Online 01.07.2021
https://doi.org/10.34229/2707-451X.21.2.8
On Biomedical Computations in Cluster and Cloud Environment
Tamara Bardadym 1 *, Vasyl Gorbachuk 1 , Natalia Novoselova 2, Sergiy Osypenko 1, Vadim Skobtsov 2, Igor Tom 2
1 V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine, Kyiv
2 United Institute of Informatics Problems of the National Academy of Sciences of Belarus, Minsk
* Correspondence: This email address is being protected from spambots. You need JavaScript enabled to view it.
Introduction. This publication summarizes the experience of the use of applied containerized software tools in cloud environment, which the authors gained during the project “Development of methods, algorithms and intellectual analytical system for processing and analysis of heterogeneous clinical and biomedical data in order to improve the diagnosis of complex diseases”, accomplished by the team from the United Institute of Informatics Problems of the NAS of Belarus and V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine. In parallel, the features of biomedical data and the main approaches to their processing and classification, implemented within the framework of an intelligent analytical system, and the possibility of their implementation as part of a container application are described.
The purpose of the paper is to describe modern technologies that ensure the reproducibility of numerical experiments in this field and the tools aimed to integrate several sources of biomedical information in order to improve the diagnostics and prognosis of complex diseases. Special attention is also paid to the methods of handling data received from different sources of biomedical information. Particular attention is paid to methods of processing data obtained from various sources of biomedical information and included to the intelligent analytical system.
Results. The experience of the use of applied containerized biomedical software tools in cloud environment is summarized. The reproducibility of scientific computing in relation with modern technologies of scientific calculations is discussed. The main approaches to biomedical data preprocessing and integration in the framework of the intelligent analytical system are described. The developed hybrid classification model presents the basis of the intelligent analytical system and aims to integrate several sources of biomedical information.
Conclusions. The experience of using the developed classification module NonSmoothSVC, which is part of the developed intelligent analytical system, gained during its testing on artificial and real data, allows us to conclude about several advantages provided by the containerized form of the created application. Namely:
• It permits to provide access to real data located in cloud environment,
• It is possible to perform calculations to solve research problems on cloud resources both with the help of developed tools and with the help of cloud services,
• Such a form of research organization makes numerical experiments reproducible, i.e. any other researcher can compare the results of their developments on specific data that have already been studied by others, in order to verify the conclusions and technical feasibility of new results,
• There exists a universal opportunity to use the developed tools on technical devices of various classes from a personal computer to powerful cluster.
The hybrid classification model as a core of the intelligent system will make it possible to integrate multidimensional, heterogeneous biomedical data with the aim to better understand the molecular courses of disease origin and development, to improve the identification of disease subtypes and disease prognosis.
Keywords: сlassifier, cloud service, containerized application, heterogeneous biomedical data
Cite as: Bardadym T., Gorbachuk V., Novoselova N., Osypenko S., Skobtsov V., Tom I. On Biomedical Computations in Cluster and Cloud Environment. Cybernetics and Computer Technologies. 2021. 2. P. 76–84. https://doi.org/10.34229/2707-451X.21.2.8
References
1. Vorontsov K.V. Mathematical methods of learning by precedents (Machine Learning Theory) (in Russian) http://www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf
2. Gupal A.M., Sergienko I.V. Symmetry in DNA. Methods for Discrete Sequences Recognition. Kyiv. Naukova Dumka, 2016. 227 p. (in Russian).
3. Baldi P., Wesley Hatfield G. DNA Microarrays and Gene Expression. From Experiments to Data Analysis and Modeling. Cambridge University Press, 2011.
4. Kuhn M., Johnson K. Applied predictive modeling. New York: Springer, 2013. https://doi.org/10.1007/978-1-4614-6849-3
5. Heath L.S., Ramakrishnan N. (Eds.). Problem solving handbook in computational biology and bioinformatics. NY: Springer Science & Business Media, 2010. https://doi.org/10.1007/978-0-387-09760-2
6. Ioannidis J. Why Most Published Research Findings Are False. PLoS Medicine. 2005. 2 (8). P. e124 https://doi.org/10.1371/journal.pmed.0020124
7. Baker M. Reproducibility crisis? Natur. 2016. 26 (533). P. 353-66.
8. Strozzi F., Janssen R., Wurmus R., Crusoe M.R. et al. Scalable workflows and reproducible data analysis for genomics. In: Evolutionary Genomics, 2nd ed. New York, NY: Humana Press, 2019. P. 723–745. https://doi.org/10.1007/978-1-4939-9074-0_24
9. Zhuravlev Y., Laptin Y., Vinogradov A., Zhurbenko N., Lykhovyd O., Berezovskyi O. Linear classifiers and selection of informative features. Pattern Recogn. and Image Anal. 2017. 27 (3). P. 426–432. https://doi.org/10.1134/S1054661817030336
10. Laptin Y., Zhuravlev Y., Vinogradov A. Comparison of Some Approaches to Classification Problems, and Possibilities to Construct Optimal Solutions Efficiently. Pattern Recogn. and Image Anal. 2014. 24 (2). P. 189–195. https://doi.org/10.1134/S1054661814020175
11. Zhurbenko N.G. Linear classifier and projection on polytop. Cybern. Syst. Anal. 2020. 56 (3). P. 1–8. https://doi.org/10.1007/s10559-020-00264-3
12. Shor N.Z., Zhurbenko N.G. A minimization method using the operation of extension of the space in the direction of the difference of two successive gradients. Cybernetics. 1971. 7 (3). P. 450–459. https://doi.org/10.1007/BF01070454.
13. Shor N.Z. Minimization Methods for Non-Differentiable Functions. Springer, 1985. https://doi.org/10.1007/978-3-642-82118-9
14. Shor N.Z. Nondifferentiable Optimization and Polynomial Problems. London: Kluwer Acad. Publ, 1998. https://doi.org/10.1007/978-1-4757-6015-6
15. Laptin Y.P. Exact penalty functions and convex extensions of functions in decomposition schemes in variables. Cybernetics and Systems Analysis. 2016. 52 (1). P. 85–95. https://doi.org/10.1007/s10559-016-9803-8
16. Laptin Y.P., Bardadym T.A. Problems related to estimating the coefficients of exact penalty functions. Cybernetics and Systems Analysis. 2019. 55 (3). P. 400-412. https://doi.org/10.1007/s10559-019-00147-2
17. Chang C.-C., Lin C.-J. LIBSVM - A Library for Support Vector Machines. https://www.csie.ntu.edu.tw/~cjlin/libsvm/
18. BLAS (Basic Linear Algebra Subprograms). http://www.netlib.org/blas/
19. LAPACK – Linear Algebra PACKage. http://www.netlib.org/lapack/
20. Free software machine learning library for the Python programming language. https://scikit-learn.org/stable/index.html
21. Tools for creation of isolated Linux-containers. https://www.docker.com/
22. The Cancer Genomics Cloud. http://www.cancergenomicscloud.org/
23. The Cancer Genome Atlas (TCGA). https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
24. Bonnal R., Yates A., Goto N., Gautier L. et al. Sharing Programming Resources Between Bio* Projects. In: Evolutionary Genomics, 2nd ed., New York, NY: Humana Press, 2019. P. 747–766. https://doi.org/10.1007/978-1-4939-9074-0_25
25. Novoselova N.A., Tom I.E. Integrated network approach to protein function prediction. The Scientific Journal of Riga Technical University. Information Technology and Management Science. 2018. 21. P. 98–103. https://doi.org/10.7250/itms-2018-0016.
26. Tom I.E. Information technologies in the analysis of medical data. Science and innovations. 2016. 3. P. 28–31.
27. Novoselova N.A., Tom I.E. Semi-supervised clustering with active constraint selection. Proc. XIII International Conference "Pattern Recognition and Information Processing"- PRIP-2016, BSU, October 3–5, 2016. Minsk. P. 69–72.
28. Novoselova N.A., Tom I.E. Methods of construction of genetic data clusters. Informatics. 2016. 1 (49). P. 64–74.
29. Novoselova N.A., Tom I.E. Algorithm for ranking features for detecting biomarkers in gene expression data, Artificial Intelligence. 2013. 3. P. 58–68.
30. Novoselova N.A., Tom I.E. , Borisov A., Polaka I. Feature ranking by classification accuracy estimation of multiple data sample, Information Technology and Management Science. 2013. 16. P. 95–100. https://doi.org/10.2478/itms-2013-0015
31. Kuncheva L.I. Combining Pattern Classifiers. Methods and Algorithms. Wiley. 2004. https://doi.org/10.1002/0471660264
32. Novoselova N.A., Tom I.E., Ablameyko S.V. Evolutionary design of the classifier ensemble. Artificial Intelligence. 2011. 3. P. 429–48.
ISSN 2707-451X (Online)
ISSN 2707-4501 (Print)