参考文献/References:
[1] Turaga P, Chellappa R, Subrahmanian V S, et al. Machine Recognition of Human Activities: A Survey[J]. Circuits & Systems for Video Technology IEEE Transactions on, 2008,18(11):1473-1488.
[2] Aggarwal J K, Ryoo M S. Human activity analysis: A review[J]. ACM Computing Surveys (CSUR), 2011,43(3):16.
[3] Lipton A J, Fujiyoshi H, Patil R S. Moving target classification and tracking from real-time video[C]//Applications of Computer Vision, 1998. WACV’98. Proceedings., Fourth IEEE Workshop on. IEEE, 1998: 8-14.
[4] Negahdaripour S. Revised definition of optical flow: Integration of radiometric and geometric cues for dynamic scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(9): 961-979.
[5] Piccardi M. Background subtraction techniques: a review[C]//Systems, man and cybernetics, 2004 IEEE international conference on. IEEE, 2004, 4: 3099-3104.
[6] Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005, 1: 886-893.
[7] Wu B, Nevatia R. Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors[J]. International Journal of Computer Vision, 2007, 75(2): 247-266.
[8] Ojala T, Pietikainen M, Maenpaa T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on pattern analysis and machine intelligence, 2002, 24(7): 971-987.
[9] Oren M, Papageorgiou C, Shinha P, et al. A trainable system for people detection[C]//Proc. of Image Understanding Workshop. 1997, 24.
[10] Dollár P, Rabaud V, Cottrell G, et al. Behavior recognition via sparse spatio-temporal features[C]//Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on. IEEE, 2005: 65-72.
[11] Bengio Y. Learning deep architectures for AI[J]. Foundations and trends? in Machine Learning, 2009, 2(1): 1-127.
[12] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
[13] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
[14] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems. 2015: 91-99.
[15] Tran D, Sorokin A. Human activity recognition with metric learning[C]//European conference on computer vision. Springer, Berlin, Heidelberg, 2008: 548-561.
[16] Jiang Y G, Li Z, Chang S F. Modeling scene and object contexts for human action retrieval with few examples[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2011, 21(5): 674-681.
[17] Lewis D P, Jebara T, Noble W S. Nonstationary kernel combination[C]//Proceedings of the 23rd international conference on Machine learning. ACM, 2006: 553-560.
[18] Vieira A W, Nascimento E R, Oliveira G L, et al. Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences[C]//Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, 2012: 252-259.
[19] Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2014: 1725-1732.
[20] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in neural information processing systems. 2014: 568-576.
[21] Wang L, Qiao Y, Tang X. Action recognition with trajectory-pooled deep-convolutional descriptors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4305-4314.
[22] Guha T, Ward R K. Learning sparse representations for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1576-1588.
[23] Zheng J, Jiang Z, Phillips P J, et al. Cross-View Action Recognition via a Transferable Dictionary Pair[C]//bmvc. 2012, 1(2): 7.
[24] Zhu F, Shao L. Weakly-supervised cross-domain dictionary learning for visual recognition[J]. International Journal of Computer Vision, 2014, 109(1-2): 42-59.
[25] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]// Computer Vision and Pattern Recognition. IEEE, 2015:1-9.
[26] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]//Advances in neural information processing systems. 2014: 568-576.
[27] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//Computer Vision (ICCV), 2015 IEEE International Conference on. IEEE, 2015: 4489-4497.
[28] Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(1): 221-231.
[29] Aharon M, Elad M, Bruckstein A. k-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation[J]. IEEE Transactions on Signal Processing, 2006,54(11):4311-4322.
[30] Zhu F, Shao L. Correspondence-free dictionary learning for cross-view action recognition[C]//Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 2014: 4525-4530.
[31] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural computation, 2006, 18(7): 1527-1554.
[32] Le Q V, Zou W Y, Yeung S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011: 3361-3368.
[33] Foggia P, Saggese A, Strisciuglio N, et al. Exploiting the deep learning paradigm for recognizing human actions[C]//Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on. IEEE, 2014: 93-98.
[34] Hasan M, Roy-Chowdhury A K. Continuous learning of human activity models using deep nets[C]//European Conference on Computer Vision. Springer, Cham, 2014: 705-720.
[35] Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3d skeletons as points in a lie group[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 588-595.
[36] Devanne M, Wannous H, Berretti S, et al. 3-d human action recognition by shape analysis of motion trajectories on riemannian manifold[J]. IEEE transactions on cybernetics, 2015, 45(7): 1340-1352.
[37] Liu A A, Nie W Z, Su Y T, et al. Coupled hidden conditional random fields for RGB-D human action recognition[J]. Signal Processing, 2015, 112: 74-82.
[38] Zhang B, Yang Y, Chen C, et al. Action recognition using 3D histograms of texture and a multi-class boosting classifier[J]. IEEE Transactions on Image Processing, 2017, 26(10): 4648-4660.
[39] Wu D, Shao L. Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 724-731.
[40] Faria D R, Premebida C, Nunes U. A probabilistic approach for human everyday activities recognition using body motion from RGB-D images[C]//Robot and Human Interactive Communication, 2014 RO-MAN: The 23rd IEEE International Symposium on. IEEE, 2014: 732-737.
[41] Swears E, Hoogs A, Ji Q, et al. Complex activity recognition using granger constrained dbn (gcdbn) in sports and surveillance video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 788-795.
[42] Yang S, Yuan C, Hu W, et al. A hierarchical model based on latent dirichlet allocation for action recognition[C]//Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 2014: 2613-2618.
[43] Lan Z, Lin M, Li X, et al. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 204-212.
[44] Gaglio S, Re G L, Morana M. Human activity recognition process using 3-D posture data[J]. IEEE Transactions on Human-Machine Systems, 2015, 45(5): 586-597.
[45] Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms[C]//International conference on machine learning. 2015: 843-852.
[46] Gan Z, Li C, Henao R, et al. Deep temporal sigmoid belief networks for sequence modeling[C]//Advances in Neural Information Processing Systems. 2015: 2467-2475.
[47] Abu-El-Haija S, Kothari N, Lee J, et al. Youtube-8m: A large-scale video classification benchmark[J]. arXiv preprint arXiv:1609.08675, 2016.
[48] Du K, Shi Y, Lei B, et al. A method of human action recognition based on spatio-temporal interest points and PLSA[C]//Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), 2016 International Conference on. IEEE, 2016: 69-72.
[49] Xiao Q, Si Y. Human action recognition using autoencoder[C]//Computer and Communications (ICCC), 2017 3rd IEEE International Conference on. IEEE, 2017: 1672-1675.
[50] Plaut E. From Principal Subspaces to Principal Components with Linear Autoencoders[J]. arXiv preprint arXiv:1804.10253, 2018.
[51] Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, et al. Beyond short snippets: Deep networks for video classification[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 4694-4702.