[1]谢日敏,陈杰.基于视频的人体行为识别综述[J].信息化理论与实践,2018,(01):2-11.
 A Review of Human Behavior Recognition Based On Video Xie Rimin 1, 2, Chen Jie2[J].Information Theory and Practice,2018,(01):2-11.
点击复制

基于视频的人体行为识别综述()
分享到:

《信息化理论与实践》[ISSN:2520-5862/CN:]

卷:
期数:
2018年01
页码:
2-11
栏目:
出版日期:
2019-06-06

文章信息/Info

Title:
A Review of Human Behavior Recognition Based On Video Xie Rimin 1, 2, Chen Jie2
作者:
谢日敏12 陈杰2
1 (福建商学院 信息工程系,福建福州 350506)
Author(s):
1
(Department of Information Engineering FUJIAN Business University, 350506, China) 2(Center for Information Technology FUJIAN Business University, 350506, China)
关键词:
人体行为识别特征提取深度学习
Keywords:
human action recognition feature extraction deep learning
摘要:
目的】为智能监控、辅助诊断、虚拟现实、运动辅助、智能家居与安防等诸多领域提供理论基础,研究基于视频的人体行为识别技术。【方法】首先,分析基于视频的人体行为识别关键技术,主要涉及到特征提取、特征融合与描述、分类方法三个方面。接着,对近年该领域的各种方法进行整理分类。【结果】阐述了基于视频的人体行为识别的研究难点和研究方向。【局限】由于其应用的广泛性,许多重要技术未能完全涉及。【结论】对后续智能监控、虚拟现实等方面的研究具有借鉴意义。
Abstract:
[Objective] This paper studies human action recognition based on video, which provides theoretical basis of intelligent monitoring, auxiliary diagnosis, virtual reality, sports assistance, intelligent home and security protection etc. [Methods] The key technologies were retrospectively analyzed, which mainly involve three aspects: feature extraction, feature fusion and description, and classification methods. Then, various methods in this field were sorted out and classified. [Results] we discussed the research difficulties and research directions. [Limitations] Because of its wide application, many important technologies were not fully involved. [Conclusions] It can be used for reference in the research of intelligent monitoring and virtual reality etc

参考文献/References:

[1] Turaga P, Chellappa R, Subrahmanian V S, et al. Machine Recognition of Human Activities: A Survey[J]. Circuits & Systems for Video Technology IEEE Transactions on, 2008,18(11):1473-1488.



[2] Aggarwal J K, Ryoo M S. Human activity analysis: A review[J]. ACM Computing Surveys (CSUR), 2011,43(3):16.



[3] Lipton A J, Fujiyoshi H, Patil R S. Moving target classification and tracking from real-time video[C]//Applications of Computer Vision, 1998. WACV’98. Proceedings., Fourth IEEE Workshop on. IEEE, 1998: 8-14.



[4] Negahdaripour S. Revised definition of optical flow: Integration of radiometric and geometric cues for dynamic scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(9): 961-979.



[5] Piccardi M. Background subtraction techniques: a review[C]//Systems, man and cybernetics, 2004 IEEE international conference on. IEEE, 2004, 4: 3099-3104.



[6] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.



[7] Wu B, Nevatia R. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors[J]. International Journal of Computer Vision, 2007, 75(2): 247-266.



[8] Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on pattern analysis and machine intelligence, 2002, 24(7): 971-987.



[9] Oren M, Papageorgiou C, Shinha P, et al. A trainable system for people detection[C]//Proc. of Image Understanding Workshop. 1997, 24.



[10] Dollár P, Rabaud V, Cottrell G, et al. Behavior recognition via sparse spatio-temporal features[C]//Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on. IEEE, 2005: 65-72.



[11] Bengio Y. Learning deep architectures for AI[J]. Foundations and trends? in Machine Learning, 2009, 2(1): 1-127.



[12] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.



[13] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.



[14] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems. 2015: 91-99.



[15] Tran D, Sorokin A. Human activity recognition with metric learning[C]//European conference on computer vision. Springer, Berlin, Heidelberg, 2008: 548-561.



[16] Jiang Y G, Li Z, Chang S F. Modeling scene and object contexts for human action retrieval with few examples[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2011, 21(5): 674-681.



[17] Lewis D P, Jebara T, Noble W S. Nonstationary kernel combination[C]//Proceedings of the 23rd international conference on Machine learning. ACM, 2006: 553-560.



[18] Vieira A W, Nascimento E R, Oliveira G L, et al. Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences[C]//Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, 2012: 252-259.



[19] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014: 1725-1732.



[20] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in neural information processing systems. 2014: 568-576.



[21] Wang L, Qiao Y, Tang X. Action recognition with trajectory-pooled deep-convolutional descriptors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4305-4314.



[22] Guha T, Ward R K. Learning sparse representations for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1576-1588.



[23] Zheng J, Jiang Z, Phillips P J, et al. Cross-View Action Recognition via a Transferable Dictionary Pair[C]//bmvc. 2012, 1(2): 7.



[24] Zhu F, Shao L. Weakly-supervised cross-domain dictionary learning for visual recognition[J]. International Journal of Computer Vision, 2014, 109(1-2): 42-59.



[25] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]// Computer Vision and Pattern Recognition. IEEE, 2015:1-9.



[26] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in neural information processing systems. 2014: 568-576.



[27] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, 2015: 4489-4497.



[28] Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(1): 221-231.



[29] Aharon M, Elad M, Bruckstein A. k-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation[J]. IEEE Transactions on Signal Processing, 2006,54(11):4311-4322.



[30] Zhu F, Shao L. Correspondence-free dictionary learning for cross-view action recognition[C]//Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 2014: 4525-4530.



[31] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-1554.



[32] Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011: 3361-3368.



[33] Foggia P, Saggese A, Strisciuglio N, et al. Exploiting the deep learning paradigm for recognizing human actions[C]//Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on. IEEE, 2014: 93-98.



[34] Hasan M, Roy-Chowdhury A K. Continuous learning of human activity models using deep nets[C]//European Conference on Computer Vision. Springer, Cham, 2014: 705-720.



[35] Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3d skeletons as points in a lie group[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 588-595.



[36] Devanne M, Wannous H, Berretti S, et al. 3-d human action recognition by shape analysis of motion trajectories on riemannian manifold[J]. IEEE transactions on cybernetics, 2015, 45(7): 1340-1352.



[37] Liu A A, Nie W Z, Su Y T, et al. Coupled hidden conditional random fields for RGB-D human action recognition[J]. Signal Processing, 2015, 112: 74-82.



[38] Zhang B, Yang Y, Chen C, et al. Action recognition using 3D histograms of texture and a multi-class boosting classifier[J]. IEEE Transactions on Image Processing, 2017, 26(10): 4648-4660.



[39] Wu D, Shao L. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 724-731.



[40] Faria D R, Premebida C, Nunes U. A probabilistic approach for human everyday activities recognition using body motion from RGB-D images[C]//Robot and Human Interactive Communication, 2014 RO-MAN: The 23rd IEEE International Symposium on. IEEE, 2014: 732-737.



[41] Swears E, Hoogs A, Ji Q, et al. Complex activity recognition using granger constrained dbn (gcdbn) in sports and surveillance video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 788-795.



[42] Yang S, Yuan C, Hu W, et al. A hierarchical model based on latent dirichlet allocation for action recognition[C]//Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 2014: 2613-2618.



[43] Lan Z, Lin M, Li X, et al. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 204-212.



[44] Gaglio S, Re G L, Morana M. Human activity recognition process using 3-D posture data[J]. IEEE Transactions on Human-Machine Systems, 2015, 45(5): 586-597.



[45] Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms[C]//International conference on machine learning. 2015: 843-852.



[46] Gan Z, Li C, Henao R, et al. Deep temporal sigmoid belief networks for sequence modeling[C]//Advances in Neural Information Processing Systems. 2015: 2467-2475.



[47] Abu-El-Haija S, Kothari N, Lee J, et al. Youtube-8m: A large-scale video classification benchmark[J]. arXiv preprint arXiv:1609.08675, 2016.



[48] Du K, Shi Y, Lei B, et al. A method of human action recognition based on spatio-temporal interest points and PLSA[C]//Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2016 International Conference on. IEEE, 2016: 69-72.



[49] Xiao Q, Si Y. Human action recognition using autoencoder[C]//Computer and Communications (ICCC), 2017 3rd IEEE International Conference on. IEEE, 2017: 1672-1675.



[50] Plaut E. From Principal Subspaces to Principal Components with Linear Autoencoders[J]. arXiv preprint arXiv:1804.10253, 2018.



[51] Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets: Deep networks for video classification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4694-4702.

更新日期/Last Update: 2019-09-05