2026, issue 1, p. 75-93

Received 02.09.2025; Revised 15.10.2025; Accepted 03.03.2026

Published 27.03.2026; First Online 31.03.2026

https://doi.org/10.34229/2707-451X.26.1.7

Previous  |  FULL TEXT (PDF)  |  Next

        Open Access under CC BY-NC 4.0 License

UDC 004.93

An Analysis of Visual Single Object Tracking Methods

Yevhen Romaniak

Institute of Information Technologies and Systems of the NAS of Ukraine, Kyiv

Correspondence: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Introduction. Visual object tracking is an important task in computer vision with a wide range of applications, including autonomous navigation, robotics and surveillance. The problem involves estimating an object's position across a sequence of video frames, given its initial location. Despite significant research efforts, the task remains challenging due to factors such as target occlusions, changes in illumination, motion blur, and object deformations. Tracking methods are categorized into short-term, where the target is assumed to remain consistently within the field of view, and long-term, that handle situations where the object may disappear and reappear. This paper provides an in-depth analysis of various single object tracking (SOT) methods, covering traditional approaches like correlation-based and keypoint-based trackers, as well as modern deep learning techniques.

The purpose of the paper is to provide a comprehensive analysis of methods for visual single object tracking (SOT), considering both short-term and long-term tracking scenarios, and benchmark datasets that are used for algorithm evaluation. The paper aims to review the core principles of different tracking approaches, including correlation filters, keypoint-based methods, and various deep learning models, such as Siamese neural networks, transformers and others. Additionally, the study presents an overview of popular benchmark datasets like VOT 2018, LaSOT, and GOT-10k and compares the performance of most of the reviewed algorithms on these benchmarks. This comparison highlights the strengths and weaknesses of different tracking approaches and provides a basis for future research directions, particularly in enhancing the efficiency, adaptability and speed of tracking algorithms for real-world applications.

Results. Correlation-based trackers are known for their high speed and reasonable performance. These methods leverage the Fourier domain for efficient calculations and can be enhanced with various features, from hand-crafted ones like HOG to deep convolutional features. However, they require modifications for long-term tracking to handle object disappearance and reduce error accumulation. While some of the reviewed methods account for these challenges, they do not solve them completely. Keypoint-based trackers track objects by identifying and matching interesting points or features across frames. Methods like Kanade-Lucas-Tomasi (KLT) provide a foundation, while SIFT or ORB detectors increase robustness to noise and scale changes. These trackers are particularly useful for scenarios with partial occlusions, as they can track a subset of the object's points. However, they may struggle with low-textured or small objects. Deep learning-based trackers represent a major advancement, surpassing traditional methods in accuracy and robustness due to their powerful feature representation capabilities. Some deep trackers, such as SiamFC and SiamRPN, show good accuracy and real-time performance on GPU. The paper's comparison of algorithms on benchmarks like VOT 2018, LaSOT, and GOT-10k demonstrates that deep learning-based approaches show superior performance in complex tracking scenarios, but often are computationally demanding.

Conclusions. The analysis concludes that visual object tracking has evolved significantly with the appearance of deep learning methods, which has enabled trackers to achieve superior accuracy and robustness compared to traditional methods. The introduction of large-scale, annotated datasets like VOT, GOT, and LaSOT has been crucial in driving this progress and providing a standardized framework for evaluating new algorithms. While correlation filters and keypoint-based methods remain viable for certain applications, especially in resource-constrained environments, deep learning-based trackers, particularly Siamese networks and transformers, have emerged as the leading approaches. Future research should focus on optimizing the efficiency and adaptability of these algorithms to make them more suitable for real-time applications and diverse real-world scenarios.

 

Keywords: visual object tracking, single object tracking, correlation filters, keypoint tracking, Siamese networks, transformers.

 

Cite as: Romaniak Y. An Analysis of Visual Single Object Tracking Methods. Cybernetics and Computer Technologies. 2026. 1. P. 75–93. https://doi.org/10.34229/2707-451X.26.1.7

 

References

           1.     Suslenko O. Comparison of Neural Network Architectures for Spatial Orientation of Unmanned Aerial Vehicles and Ground Drones. Cybernetics and Computer Technologies. 2025. 4. P. 99–105. (in Ukrainian) https://doi.org/10.34229/2707-451X.25.4.9

           2.     Kristan M., Leonardis A., Matas J. et al. The Sixth Visual Object Tracking VOT2018 Challenge Results. Computer Vision – ECCV 2018 Workshops. Cham : Springer International Publishing, 2019. ISBN 978-3-030-11009-3. P. 3–53. https://doi.org/10.1007/978-3-030-11009-3_1

           3.     Bolme D.S., Beveridge J.R., Draper B.A., Lui, Y. M. Visual object tracking using adaptive correlation filters. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. P. 2544–2550. https://doi.org/10.1109/CVPR.2010.5539960

           4.     Henriques J.F., Caseiro R., Martins P., Batista J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (3). P. 583–596. https://doi.org/10.1109/TPAMI.2014.2345390

           5.     Galoogahi H.K., Sim T., Lucey S. Multi-channel Correlation Filters. 2013 IEEE International Conference on Computer Vision2013 IEEE International Conference on Computer Vision (ICCV). (Sydney, Australia, 12.2013). Sydney, Australia : IEEE, 2013. P. 3072–3079. https://doi.org/10.1109/ICCV.2013.381

           6.     Li Y., Zhu J. A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration. Computer Vision – ECCV 2014 Workshops. Cham : Springer International Publishing, 2015. P. 254–265. https://doi.org/10.1007/978-3-319-16181-5_18

           7.     Danelljan M., Hager G., Khan F.S., Felsberg M. Discriminative Scale Space Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39 (8). P. 1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928

           8.     Danelljan M., Khan F.S., Felsberg M., Van de Weijer J. Adaptive Color Attributes for Real-Time Visual Tracking. 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Columbus, OH, 06.2014). Columbus, OH : IEEE, 2014. P. 1090–1097. https://doi.org/10.1109/CVPR.2014.143

           9.     Danelljan M., Häger G., Khan F.S., Felsberg M. Coloring Channel Representations for Visual Tracking. Image Analysis. P. 117–129. https://doi.org/10.1007/978-3-319-19665-7_10

       10.     Ma C., Huang J.-B., Yang X., Yang M.H. Hierarchical Convolutional Features for Visual Tracking. 2015 IEEE International Conference on Computer Vision (ICCV)2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile : IEEE, 2015. P. 3074–3082. https://doi.org/10.1109/ICCV.2015.352

       11.     He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Las Vegas, NV, USA, 06.2016). Las Vegas, NV, USA : IEEE, 2016. P. 770–778. https://doi.org/10.1109/CVPR.2016.90

       12.     Danelljan M., Robinson A., Khan F.S., Felsberg M. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking. Computer Vision – ECCV 2016. P. 472–488. https://doi.org/10.1007/978-3-319-46454-1_29

       13.     Danelljan M., Bhat G., Khan F.S., Felsberg M. ECO: Efficient Convolution Operators for Tracking. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Honolulu, HI, 07.2017). Honolulu, HI : IEEE, 2017. P. 6931–6939. https://doi.org/10.1109/CVPR.2017.733

       14.     Ma C., Yang X., Zhang C., Yang, M. H. Long-term correlation tracking. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Boston, MA, USA, 06.2015). Boston, MA, USA : IEEE, 2015. P. 5388–5396. https://doi.org/10.1109/CVPR.2015.7299177

       15.     Ma C., Huang J.-B., Yang X., Yang M. H. Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking. International Journal of Computer Vision. 126 (8). P. 771–796. https://doi.org/10.1007/s11263-018-1076-4

       16.     Lukežič A., Zajc L.Č., Vojíř T., Matas J., Kristan M. FuCoLoT – A Fully-Correlational Long-Term Tracker. Computer Vision – ACCV 2018. P. 595–611. https://doi.org/10.1007/978-3-030-20890-5_38

       17.     Bertinetto L., Valmadre J., Golodetz S., Miksik O., Torr P.H. Staple: Complementary Learners for Real-Time Tracking. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Las Vegas, NV, USA, 06.2016). Las Vegas, NV, USA : IEEE, 2016. P. 1401–1409. https://doi.org/10.1109/CVPR.2016.156

       18.     Kyyko V., Matsello V. Real-Time Tracking of Objects in Video Based on Adaptive Histogram Features. Kibernetika i vyčislitelʹnaâ tehnika. 2023. 3 (213). P. 4–19. (In Ukrainian) https://doi.org/10.15407/kvt213.03.004

       19.     Yang J., Tang W., Ding Z. Long-Term Target Tracking of UAVs Based on Kernelized Correlation Filter. Mathematics. 23 (6). P. 3006. https://doi.org/10.3390/math9233006

       20.     Shi J., Tomasi C. Good features to track. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. (Seattle, WA, USA, 1994). Seattle, WA, USA : IEEE Comput. Soc. Press, 1994. P. 593–600. https://doi.org/10.1109/CVPR.1994.323794

       21.     Lowe D.G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision. 60 (2). P. 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94

       22.     Rublee E., Rabaud V., Konolige K., Bradski G.R. ORB: An efficient alternative to SIFT or SURF. 2011 IEEE International Conference on Computer Vision (ICCV). (Barcelona, Spain, 11.2011). Barcelona, Spain : IEEE, 2011. P. 2564–2571. https://doi.org/10.1109/ICCV.2011.6126544

       23.     DeTone D., Malisiewicz T., Rabinovich A. SuperPoint: Self-Supervised Interest Point Detection and Description. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). (Salt Lake City, UT, USA, 06.2018). Salt Lake City, UT, USA : IEEE, 2018. P. 337–33712. https://doi.org/10.1109/CVPRW.2018.00060

       24.     Lucas B.D., Kanade T. An Iterative Image Registration Technique with an Application to Stereo Vision. IJCAI’81: 7th international joint conference on Artificial intelligence. Vancouver, Canada, 1981. P. 674–679. https://hal.science/hal-03697340

       25.     Kalal Z., Mikolajczyk K., Matas J. Forward-Backward Error: Automatic Detection of Tracking Failures. 2010 20th International Conference on Pattern Recognition (ICPR). (Istanbul, Turkey, 08.2010). Istanbul, Turkey : IEEE, 2010. P. 2756–2759. https://doi.org/10.1109/ICPR.2010.675

       26.     Nebehay G., Pflugfelder R. Consensus-based matching and tracking of keypoints for object tracking. 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). (Steamboat Springs, CO, USA, 03.2014). Steamboat Springs, CO, USA : IEEE, 2014. P. 862–869. https://doi.org/10.1109/WACV.2014.6836013

       27.     Nebehay G., Pflugfelder R. Clustering of static-adaptive correspondences for deformable object tracking. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Boston, MA, USA, 06.2015). Boston, MA, USA : IEEE, 2015. P. 2784–2791. https://doi.org/10.1109/CVPR.2015.7298895

       28.     Hong Z., Chen Z., Wang C., Mei X., Prokhorov D., Tao D. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (Boston, MA, USA, 06.2015). Boston, MA, USA : IEEE, 2015. P. 749–758. https://doi.org/10.1109/CVPR.2015.7298675

       29.     Derue F.-X., Bilodeau G.-A., Bergevin R. SPiKeS: Superpixel-Keypoints structure for robust visual tracking. Machine Vision and Applications. 29 (1). P. 175–186. https://doi.org/10.1007/s00138-017-0884-9

       30.     Bertinetto L., Valmadre J., Henriques J.F., Vedaldi A., Torr P.H. Fully-Convolutional Siamese Networks for Object Tracking. Computer Vision – ECCV 2016 Workshops. P. 850–865. https://doi.org/10.1007/978-3-319-48881-3_56

       31.     Held D., Thrun S., Savarese S. Learning to Track at 100 FPS with Deep Regression Networks. Computer Vision – ECCV 2016. Cham : Springer International Publishing, 2016. P. 749–765. https://doi.org/10.1007/978-3-319-46448-0_45

       32.     Li B., Yan J., Wu W., Zhu Z., Hu X. High Performance Visual Tracking with Siamese Region Proposal Network. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (Salt Lake City, UT, 06.2018). Salt Lake City, UT : IEEE, 2018. https://doi.org/10.1109/CVPR.2018.00935

       33.     Zhu Z., Wang Q., Li B., Wu W., Yan J., Hu W. Distractor-Aware Siamese Networks for Visual Object Tracking. Computer Vision – ECCV 2018. P. 103–119. https://doi.org/10.1007/978-3-030-01240-3_7

       34.     Hu W., Wang Q., Zhang L., Bertinetto L., Torr, P. H. Siammask: A framework for fast online object tracking and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023. 45 (3), P. 3072–3089. https://ieeexplore.ieee.org/document/10036241

       35.     Yu Y., Xiong Y., Huang W., Scott M.R. Deformable Siamese Attention Networks for Visual Object Tracking. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (Seattle, WA, USA, 06.2020). Seattle, WA, USA : IEEE, 2020. P. 6727–6736. https://doi.org/10.1109/CVPR42600.2020.00676

       36.     Voigtlaender P., Luiten J., Torr P.H.S., Leibe B. Siam R-CNN: Visual Tracking by Re-Detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (Seattle, WA, USA, 06.2020). Seattle, WA, USA : IEEE, 2020. P. 6577–6587. https://doi.org/10.1109/CVPR42600.2020.00661

       37.     Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I. Attention is All you Need. Advances in Neural Information Processing Systems (2017). Curran Associates, Inc., 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

       38.     Chen X., Yan B., Zhu J., Wang D., Yang X., Lu H. Transformer Tracking. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (Nashville, TN, USA, 06.2021). Nashville, TN, USA : IEEE, 2021. P. 8122–8131. https://doi.org/10.1109/CVPR46437.2021.00803

       39.     Yan B., Peng H., Fu J., Wang D., Lu H. Learning Spatio-Temporal Transformer for Visual Tracking. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). (Montreal, QC, Canada, 10.2021). Montreal, QC, Canada : IEEE, 2021. P. 10428–10437. https://doi.org/10.1109/ICCV48922.2021.01028

       40.     Fan H., Ling H. Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking. 2017 IEEE International Conference on Computer Vision (ICCV). (Venice, 10.2017). Venice : IEEE, 2017. P. 5487–5495. https://doi.org/10.1109/ICCV.2017.585

       41.     Yan B., Zhao H., Wang D., Lu H., Yang X. ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-Term Tracking. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). (Seoul, Korea (South), 10.2019). Seoul, Korea (South) : IEEE, 2019. P. 2385–2393. https://doi.org/10.1109/ICCV.2019.00247

       42.     Huang L., Zhao X., Huang K. GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking. Proceedings of the AAAI Conference on Artificial Intelligence. 34 (7). P. 11037–11044. https://doi.org/10.1609/aaai.v34i07.6758

       43.     Dai K., Zhang Y., Wang D., Li J., Lu H., Yang X. High-Performance Long-Term Tracking With Meta-Updater. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (Seattle, WA, USA, 06.2020). Seattle, WA, USA : IEEE, 2020. P. 6297–6306. https://doi.org/10.1109/CVPR42600.2020.00633

       44.     Dunnhofer M., Micheloni C. CoCoLoT: Combining Complementary Trackers in Long-Term Visual Tracking. 2022 26th International Conference on Pattern Recognition (ICPR). (Montreal, QC, Canada, 21.08.2022). Montreal, QC, Canada : IEEE, 2022. P. 5132–5139. https://doi.org/10.1109/ICPR56361.2022.9956082

       45.     Kristan M., Matas J., Leonardis A., Vojír T., Pflugfelder R.P., Fernandez G.J., Nebehay G., Porikli F.M., Cehovin L. A Novel Performance Evaluation Methodology for Single-Target Trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence. 38 (11). P. 2137–2155. https://doi.org/10.1109/TPAMI.2016.2516982

       46.     Huang L., Zhao X., Huang K. GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild. IEEE Transactions on Pattern Analysis and Machine Intelligence. 43 (5). P. 1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464

       47.     Fan H., Lin L., Yang F., Chu P., Deng G., Yu S., Bai H., Xu Y., Liao C., Ling, H. LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). (Long Beach, CA, USA, 06.2019). Long Beach, CA, USA : IEEE, 2019. P. 5369–5378. https://doi.org/10.1109/CVPR.2019.00552

       48.     Yu H., Li G., Zhang W., Huang Q., Du D., Tian Q., Sebe N. The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline. International Journal of Computer Vision. 128 (5). P. 1141–1159. https://doi.org/10.1007/s11263-019-01266-1

       49.     LaSOT – Large-scale Single Object Tracking. http://vision.cs.stonybrook.edu/~lasot/results.html (accessed: 17.09.2025)

       50.     GOT-10k: Generic Object Tracking Benchmark. http://got-10k.aitestunion.com/leaderboard (accessed: 17.09.2025)

       51.     Swati Kumar V. N., Dinesh Kawa S., Engineer P.J. An Efficient Object Tracking on Edge Devices with Quantized Siamese Networks. 2025 Devices for Integrated Circuit (DevIC). (Kalyani, India, 05.04.2025). Kalyani, India : IEEE, 2025. P. 604–609. https://doi.org/10.1109/DevIC63749.2025.11012629

 

 

ISSN 2707-451X (Online)

ISSN 2707-4501 (Print)

Previous  |  FULL TEXT (PDF)  |  Next

 

 

            Archive

 

© Website and Design. 2019-2026,

V.M. Glushkov Institute of Cybernetics of the NAS of Ukraine,

National Academy of Sciences of Ukraine.