Optimized frame detection technique in vehicle accident using deep learning

Mardin A. Anwer; Shareef M. Shareef; Abbas M. Ali

doi:10.21271/ZJPAS.32.4.5

Authors

Mardin A. Anwer Department of Software and Informatics, College of Engineering, Salahaddin University-Erbil, Kurdistan Region, Iraq
Shareef M. Shareef Department of Software and Informatics, College of Engineering, Salahaddin University-Erbil, Kurdistan Region, Iraq
Abbas M. Ali Department of Software and Informatics, College of Engineering, Salahaddin University-Erbil, Kurdistan Region, Iraq

DOI:

https://doi.org/10.21271/ZJPAS.32.4.5

Keywords:

Intelligent transportation system, Object detection, video processing technique, video segmentation, Gaussian mixture model, transfer learning, deep learning, GoogleNet, AlexNet.

Abstract

Video processing becomes one of the most popular and needed steps in machine leering. Todays, Cameras are installed in many places for many reasons including government services. One of the most applications for this concern is traffic police services. One of the main problems of using videos in machine learning application is the duration of the video; which is consuming time, paperwork and space in processing. This leads to increase the computation cost through a high number of frames. This paper proposes an algorithm to optimize videos duration using a Gaussian mixture model (GMM) method for real accident video. The Histogram of Gradient (HoG) has been used to extract the features of the video frames, a scratch CNN has been designed and conducted on two common datasets; Stanford Dogs Dataset (SDD) and Vehicle Make and Model Recognition Dataset (VMMRdb) in addition to a local dataset that created for this research. The experimental work is done in two ways, the first is after applying GMM, the finding revealed that the number of frames in the dataset was decreased by nearly 51%. The second is comparing the accuracy and complexity of these datasets has been done. Whereas the experimental results of accuracy illustrated for the proposed CNN, 85% on the local dataset, 85% on SDD Dataset and 86% on VMMRdb Dataset. However, applying GoogleNet and AlexNet on the same datasets achieved 82%, 79%, 80%, 83%, 81%, 83% respectively.

References

Anwer M., Shareef M, Ali A. (2019)’ Smart Traffic Incident Reporting System in e-Government’ ECIAIR conference, UK DOI: 10.34190/ECIAIR.19.061

Chang H.S., S. Sull, and S.U. Lee (1999) ‘Efficient video indexing scheme for content- based retrieval’. IEEE Transactions on Circuits and Systems for Video Technol- ogy, 98:1269–1279.

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich (2015) ’ Going deeper with convolutions’. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9.

Clady, Xavier & Negri, Pablo & Milgram, Maurice & Poulenard, Raphael (2008)’ Multi-class Vehicle Type Recognition System’. 5064. 228-239. 10.1007/978-3-540-69939-2_22.

Dey N., Ashour A.S (2018) ‘Applied Examples and Applications of Localization and Tracking Problem of Multiple Speech Sources’. In: Direction of Arrival Estimation and Localization of Multi-Speech Sources. Springer Briefs in Electrical and Computer Engineering. Springer, Cham.

Fang J., Y. Zhou, Y. Yu, and S. Du(2017) ‘Fine-grained vehicle model recognition using a coarse-to-ﬁne convolutional neural network architecture’ IEEE Trans. Intell. Transp. Syst., vol. 18, no. 7, pp. 1782–1792, Jul.

Hasanpour, S., Rouhani M., Fayyaz, M., Sabokrou, M., (2018). ‘Let’s keep it simple,using simple architectures to out perform deeper and more complex architectures’. arXiv:1608.06037v7.

Huang YL (2018) ‘Video Signal Processing’. In: Dolecek G. eds Advances in Multirate Systems. Springer, Cham.

Jakob Verbeek, Nikos Vlassis, Ben Krose (2003) ‘Efficient greedy learning of Gaussian mixture models ‘.Neural Computation, Massachusetts Institute of Technology Press MIT Press, 2003, 15 2, pp.469-485. 10.1162/089976603762553004.

Karpathy A., G. Toderici, S. Shetty (2014) ‘Large-scale Video Classification with Convolutional Neural Networks.’

Kolekar M., (2018)’ Intelligent video surveillance system: An algorithm Approach’. CRC Press, Tayor & Francis Group.

Krizhevsky, Alex , Sutskever, Ilya , Hinton, Geoffrey (2012) ‘ImageNet Classification with Deep Convolutional Neural Networks’. Neural Information Processing Systems. 25. 10.1145/3065386.

Liu H., Tang H.,Xiao W.,Guo Z., Tian L., Gao Y. (2016) ‘Sequential Bag-of-Words model for human action classification’. CAAI Transactions on Intelligence Technology, Volume 1, Issue 2, Pages 125-136

Minaee S., Abdolrashidi A., and Y. Wang (2015) ‘Iris recognition using scattering transform and textural features’, in Signal Processing and Signal Processing Education Workshop SP/SPE, 2015 IEEE, 2015, pp. 37-42.

Mustaffa A., K. HOKAO (2013) ‘Database development of road traffic accident case study Johor Bahru, Malaysia’ Journal of Society for Transportation and Traffic Studies JSTS Vol.3 No.1.

Navneet Dalal, Bill Triggs (2005) ‘Histograms of Oriented Gradients for Human Detection. International Conference on Computer Vision & Pattern Recognition’ (CVPR '05), San Diego, United States. pp.886—893.

Niebles J. C., C.-W. Chen, and L. Fei-Fei (2010) ‘Modeling temporal structure of decomposable motion segments for activity classification’. In ECCV, pages 392–405. Springer.

Padalkar, Milind (2010) ‘Histogram Based Efficient Video Shot Detection Algorithms’. 10.13140/RG.2.1.1590.3847.

Pickering M.J. and S. Ru ̈ger (2003) ‘Evaluation of key-frame based retrieval techniques for video’. Computer Vision and Image Understanding, 922-3:217–235.

Ranganatha S., Gowramma, Y. (2018) ‘Image Training and LBPH Based Algorithm for Face Tracking in Different Background Video Sequence’. International Journal of Computer Sciences and Engineering. 6. 349-354. 10.26438/ijcse/v6i9.349354.

Salih. D., Ali .A (2019)’ Appearance-based indoor place recognition for localization of the visually impaired person’ ZJPAS (2019) , 31(4);70-81. DOI: http://dx.doi.org/10.21271/ZJPAS.31.4.8

Siddharth Das., (2017) ‘CNN Architectures: LeNet, AlexNet, VGG, GoogLeNet, ResNet and more’. online at [ https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5]

Sochor J., A. Herout, and J. Havel (2016) ‘Box Cars: 3D boxes as CNN input for improved ﬁne-grained vehicle recognition’ in Proc. Comput. Vis. Pattern Recognit., Jun., pp. 3006–3015.

Sreenu, G., Saleem Durai, M.A (2019) ‘Intelligent video surveillance: a review through deep learning techniques for crowd analysis’. J Big Data 6, 48 doi:10.1186/s40537-019-0212-5.

Sun X., Gu J., Huang R.,Zou R.,Giron B. (2019) ‘Surface Defects Recognition of Wheel Hub Based on Improved Faster R-CNN’. Electronics 2019, 8, 481; doi:10.3390/electronics8050481.

Sze K.W., K.M. Lam, and G. Qiu. (2005) ‘A new key frame representation for video segment retrieval’. IEEE Transactions on Circuits and Systems for Video Technology, 159:1148–1155.

Xinchen Wang X., Zhang W.,Wu X., Xiao L., Qian Y.,Fang Z. (2017) ‘Real-time vehicle type classiﬁcation with deep convolutional neural networks’. J Real-Time Image Proc DOI 10.1007/s11554-017-0712-5, Springer.

Yang L., P. Luo, C. C. Loy, and X. Tang (2015) ‘A large-scale car dataset for ﬁne-grained categorization and veriﬁcation’. in Proc. Comput. Vis. Pattern Recognit., Jun., pp. 3973–3981.

Zhang H.J., J. Wu, D. Zhong, and S.W. Smoliar (1997) ‘An integrated system for content-based video retrieval and browsing’. Pattern Recognition, 304:643–658.