2025, issue 1, p. 74-80
Received 06.01.2025; Revised 27.01.2025; Accepted 25.03.2025
Published 28.03.2025; First Online 30.03.2025
https://doi.org/10.34229/2707-451X.25.1.7
Previous | FULL TEXT (in Ukrainian) | Next
Chirplet Analysis of Speech Signals Based on the Hilbert–Huang Transform
V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine, Kyiv
Correspondence: This email address is being protected from spambots. You need JavaScript enabled to view it.
Introduction. This article proposes a novel approach to speech signal analysis based on the chirplet transform, which integrates the Hilbert – Huang transform with chirplet analysis. This method provides enhanced segmentation and feature extraction capabilities, enabling accurate identification of time-frequency characteristics in speech signals. It is proposed to overcome the limitations of traditional methods such as Short-Time Fourier transform and wavelet analysis, by offering a more adaptive solution tailored to the non-linear and non-stationary nature of speech signals.
The purpose of the work is to develop a numerical-analytic method for phonetic analysis of speech signals. The central feature of the methodology is the combination of empirical mode decomposition from Hilbert – Huang transform with chirplet projections onto alternative nonlinear scales, such as the mel-scale. This approach ensures superior localization of dynamic changes in the frequency-time domain, while ensures superior with the perceptual characteristics of human hearing. By leveraging chirplet transforms, the proposed method enhances the detection of linguistic elements, including phonemes and other speech segments, even in the presence of overlapping components.
Results. The practical implementation of this method is demonstrated through experimental analysis of speech signals. The results indicate an improvement in the accuracy of segmentation and noise suppression compared to conventional approaches. Time-frequency visualizations illustrate the adaptability of the method in handling complex speech signals with varying dynamic properties.
Conclusions. This research contributes to advancements in speech analysis, recognition, and audio signal processing, offering potential applications in areas such as voice-controlled systems, linguistic studies, and speech recognition technologies. The proposed approach can be further refined and integrated with machine learning algorithms to automate the classification and analysis of speech segments. The article provides a foundation for future studies on the intersection of chirplet transforms and nonlinear signal processing, emphasizing their role in addressing real-world challenges in speech and audio technologies.
Keywords: chirplet transform, Hilbert – Huang transform, empirical mode decomposition, mel-scale, alternative nonlinear scales.
Cite as: Bezverbnyi I. Chirplet Analysis of Speech Signals Based on the Hilbert–Huang Transform. Cybernetics and Computer Technologies. 2025. 1. P. 74–80. (in Ukrainian) https://doi.org/10.34229/2707-451X.25.1.7
References
1. Semotiuk M.V., Palagin A.V. Technocratic model of the human auditory system. arXiv preprint arXiv:2310.05639, 2023. https://arxiv.org/abs/2310.05639
2. Daubechies І. Ten lectures on wavelets. Society for industrial and applied mathematics. 1992. 350 с. https://jqichina.wordpress.com/wp-content/uploads/2012/02/ten-lectures-of-waveletsefbc88e5b08fe6b3a2e58d81e8aeb2efbc891.pdf
3. Coifman R.R. Wavelets and their applications past and future. Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering VII. SPIE, 2009. P. 23–35. https://doi.org/10.1117/12.823537
4. Grossmann A., Morlet J. Decomposition of Hardy functions into square integrable wavelets of constant shape. SIAM journal on mathematical analysis. 1984. 15 (4). P. 723–736. https://doi.org/10.1137/0515056
5. Sazhok M., Poltyeva A., Robeiko A., Seliukh R., Fedoryn D. Punctuation Restoration for Ukrainian Broadcast Speech Recognition System based on Bidirectional Recurrent Neural Network and Word Embeddings. Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2021). 2021. Vol. I: Main Conference. P. 300–310. http://ceur-ws.org/Vol-2870/paper25.pdf
6. Sazhok M.M., Robeiko V.V., Smoliakov Ye.A., Zabolotko T.O., Seliukh R.A., Fedoryn D.Ya., Yukhymenko O.A. Modeling Domain Openness in Speech Information Technologies. Control Systems and Computers. 2023. Iss. 4. P. 19–28. https://doi.org/10.15407/csc.2023.04.019
7. Bezverbnyi I.A. On the issue of phoneme recognition in speech signal using the standing wave effect. Kompiuterni zasoby, merezhi ta systemy. 2019. http://dspace.nbuv.gov.ua/handle/123456789/168473 (in Ukrainian)
8. Pesquet-Popescu B., Pesquet J.C. Ondelettes et applications. Techniques de l’ingénieur. 2001. 5. P. 215. https://doi.org/10.51257/a-v1-te5215
9. Lokenath D. Wavelet transforms and time-frequency signal analysis. Springer Science & Business Media, 2012. 350 p.
10. Xie X., Cai H., Li C. A Voice Disease Detection Method Based on MFCCs and Shallow CNN. arXiv preprint arXiv:2304.08708, 2023. https://doi.org/10.1016/j.jvoice.2023.09.024
ISSN 2707-451X (Online)
ISSN 2707-4501 (Print)
Previous | FULL TEXT (in Ukrainian) | Next