2025, issue 4, p. 47-54

Received 17.09.2025; Revised 12.11.2025; Accepted 18.11.2025

Published 08.12.2025; First Online 15.12.2025

https://doi.org/10.34229/2707-451X.25.4.5

Previous | FULL TEXT (in Ukrainian) | Next

UDC 81’1:003:004.932.2(045)

Semiotic Approach to the Construction of a Phoneme Model of the Speech Signal

Ihor Bezverbnyi ^* , Kateryna Sosnenko

V.M. Glushkov Institute of Cybernetics, NAS of Ukraine, Kyiv

^* Correspondence: This email address is being protected from spambots. You need JavaScript enabled to view it.

Introduction. The speech signal is characterized by high variability of its physical parameters; however, phonemes retain stability of identification even under significant fluctuations of frequency and amplitude. This provides a basis for constructing models that abstract from precise acoustic values and rely on the functional and semiotic nature of speech. Such an approach enables semiotic representation of the signal, where relative parameter changes play a key role. Its methodological foundation is associated with the idea of speech signal normalization by the frequencies of the twelve-tone chromatic scale, which in turn finds confirmation in the psychoacoustic properties of human hearing and the anatomical structure of the human hearing apparatus.

The purpose of the study is to form a semiotic dictionary of the speech signal based on the calculation of frequencies required for speech transmission, normalized according to the twelve-tone chromatic scale, and to further develop a semiotic model that ensures the possibility of building interpretable speech recognition systems.

Results. The study substantiates the use of the concept of correlation between speech signal frequencies and the twelve-tone chromatic scale as the ideological basis of normalization. It is proposed to encode the signal through pairs of normalized conjugations of the frequency difference and the amplitude difference. Such representation creates a system of sign structures with a clear internal form and function, allowing not only signal analysis but also its interpretation. Based on this approach, a semiotic representation of speech has been constructed, which provides not only effective recognition but also a high degree of interpretability of the signal. In addition, the development of a recurrent neural model creates the possibility of accurate phoneme reproduction on the basis of semiotic unit patterns, opening prospects for further integration of the semiotic approach with deep learning methods.

Conclusions. Semiotic representation of the speech signal in the form of discrete sign units opens perspectives for the creation of interpretable automatic speech recognition systems. The proposed model combines theoretical novelty with practical significance, contributing to the development of computational linguistics and artificial intelligence technologies.

Keywords: speech signal, linguistic structure, phoneme model, interpretability, semiotic representation, recurrent neural model.

Cite as: Bezverbnyi I., Sosnenko K. Semiotic Approach to the Construction of a Phoneme Model of the Speech Signal. Cybernetics and Computer Technologies. 2025. 4. P. 47–54. (in Ukrainian) https://doi.org/10.34229/2707-451X.25.4.5

References

1. Rabiner L.R., Juang B.H. Fundamentals of Speech Recognition. Englewood Cliffs, NJ : Prentice-Hall, 1993. 507 p.

2. Graves A., Mohamed A., Hinton G. Speech recognition with deep recurrent neural networks. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013. P. 6645–6649. https://doi/org/10.1109/ICASSP.2013.6638947

3. Graves A., Fernández S., Gomez F., Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd International Conference on Machine Learning (ICML), 2006. P. 369–376. https://doi/org/10.1145/1143844.1143891

4. Baevski A., Zhou Y., Mohamed A., Auli M. Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 12449–12460.

5. Horndasch A., Noeth E., Batliner A., Warnke V. Phoneme-to-grapheme mapping for spoken inquiries to the semantic. Isca-archive INTERSPEECH, 2006. ICSLP. https://doi.org/10.21437/Interspeech.2006-4 (accessed: 17.09.2025)

6. Sazhok M.M., Robeiko V.V., Smoliakov Ye.A., Zabolotko T.O., Seliukh R.A., Fedoryn D.Ya, Yukhymenko O.A. Modeling domain openness in speech information technologies. Control systems and computersю 2023. No. 4. P. 19–26.

7. Semotiuk M.V., Palagin A.V. Technocratic model of the human auditory system. arXiv preprint arXiv:2310.05639, 2023. https://arxiv.org/abs/2310.05639 (accessed: 17.09.2025)

8. Semotyuk M.V., Bezverbnyi I.A. Adaptive Algorithm for Phoneme Extraction in a Speech Signal. Computer means, networks and systems. 2017. No. 16. P. 14–19. (in Ukrainian) http://jnas.nbuv.gov.ua/article/UJRN-0000848988

9. Bezverbnyi I.A. On the Issue of Phoneme Extraction in a Speech Signal Using the Standing Wave Effect. Computer means, networks and systems. 2019. No. 18. P. 32–35. (in Ukrainian) http://jnas.nbuv.gov.ua/article/UJRN-0001084065

10. Bezverbnyi I. Chirplet Analysis of Speech Signals Based on the Hilbert–Huang Transform. Cybernetics and Computer Technologies. 2025. 1. P. 74–80. (in Ukrainian) https://doi.org/10.34229/2707-451X.25.1.7

11. Hrusha V. Intelligent Processing of Data From Chlorophyll Fluorometric Sensors. Cybernetics and Computer Technologies. 2022. 1. P. 42–48. (in Ukrainian) https://doi.org/10.34229/2707-451X.22.1.5

12. Copyright Registration Certificate No. 110368. (in Ukrainian)

ISSN 2707-451X (Online)

ISSN 2707-4501 (Print)

Previous | FULL TEXT (in Ukrainian) | Next

2025, issue 4, p. 47-54

Archive