2021, issue 1, p. 54-60

Received 10.02.2021; Revised 17.02.2021; Accepted 25.03.2021

Published 30.03.2021; First Online 03.04.2021

https://doi.org/10.34229/2707-451X.21.1.5

Previous  |  Full text (in Ukrainian)  |  Next

 

UDC 519.272.2

Progress in Determination of Protein Spatial Structure Based on Machine Learning

B.O. Biletskyy

V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine, Kyiv

Correspondence: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Introduction. The task of determining the spatial structure of proteins is one of the most important unsolved problems of mankind. Life on the planet Earth is called protein, because protein molecules are the drivers of life processes in living organisms. Proteins make up about 80% of the dry mass of the cell and coordinate the processes of metabolism. The functions of proteins are defined by its spatial structure.

The results of recent competitions in methods for determining protein structures have shown significant progress in this important area. One of the research groups presented the AlphaFold 2 method, the accuracy of which reached the accuracy of experimental methods.

Purpose of the article. The aim of the work is to consider and analyze the basic principles of the AlphaFold software package for determining the spatial structure of proteins.

Results. We consider the main stages in the process of recognizing the structure of a protein using the AlphaFold program complex. The stages and corresponding methods include: search for homologous proteins based on multiple alignment methods, construction of protein-specific differentiated potential using artificial neural networks and protein structure energy optimization using gradient descent and limited sampling. We discuss how combination of various bioinformatics techniques powered by data from open data sources can lead to significant improvements in accuracy of protein structure prediction. Special attention is paid to the use of artificial neural networks for building the smooth protein-specific potential and following energy minimization based on constructed potential.

Conclusions. The combination of a number of methods and the use of information from protein and genetic data banks allows us to make significant progress in solving the extremely important task of determining the structure of a protein.

 

Keywords: protein spatial structure, Machine Learning, AlphaFold.

 

Cite as: Biletskyy B.O. Progress in Determination of Protein Spatial Structure Based on Machine Learning. Cybernetics and Computer Technologies. 2021. 1. P. 54–60. (in Ukrainian) https://doi.org/10.34229/2707-451X.21.1.5

 

References

           1.     https://www.rcsb.org/stats/growth/growth-released-structures (accessed 10.02.2021)

           2.     https://www.uniprot.org/statistics/Swiss-Prot, https://www.ebi.ac.uk/uniprot/TrEMBLstats (accessed 10.02.2021)

           3.     Zemla A. LGA: A method for finding 3D similarities in protein structures. 2003. Nucleic Acids Research. 31 (13). P. 3370–3374. https://doi.org/10.1093/nar/gkg571

           4.     EVA: EValuation of Automatic protein structure prediction. http://pdg.cnb.uam.es/eva/doc/concept.html (accessed 10.02.2021)

           5.     Protein Structure Prediction Center. https://predictioncenter.org/index.cgi (accessed 10.02.2021)

           6.     Heaven D. Why deep-learning AIs are so easy to fool. Nature. 2019. 574. P. 163–166. https://doi.org/10.1038/d41586-019-03013-5

           7.     Pinkus A. Approximation theory of the MLP model in neural networks. Acta Numerica. 1999. 8. P. 143–195. https://doi.org/10.1017/S0962492900002919

           8.     Cybenko G.V. Approximation by Superpositions of a Sigmoidal function. Mathematics of Control Signals and Systems. 1989. 2 (4). P. 303314. https://doi.org/10.1007/BF02551274

           9.     Senior A.W., Evans R., Jumper J. et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020. 577. P. 706–710. https://doi.org/10.1038/s41586-019-1923-7

       10.     https://www.rosettacommons.org/software (accessed 10.02.2021)

       11.     https://toolkit.tuebingen.mpg.de/tools/hhblits (accessed 10.02.2021)

       12.     https://blast.ncbi.nlm.nih.gov (accessed 10.02.2021)

       13.     Malouf R. A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002). 2002. P. 49–55. https://doi.org/10.3115/1118853.1118871

       14.     https://www.cathdb.info/ (accessed 10.02.2021)

 

 

ISSN 2707-451X (Online)

ISSN 2707-4501 (Print)

Previous  |  Full text (in Ukrainian)  |  Next

 

 

 

© Website and Design. 2019-2021,

V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine,

National Academy of Sciences of Ukraine.