[1]福州大学信息管理研究所.基于高校信息碎片化的信息整合构建研究报告福州大学信息管理研究所[J].信息化理论与实践,2018,(01):130-168.
点击复制

基于高校信息碎片化的信息整合构建研究报告福州大学信息管理研究所()
分享到:

《信息化理论与实践》[ISSN:2520-5862/CN:]

卷:
期数:
2018年01
页码:
130-168
栏目:
出版日期:
2019-06-06

文章信息/Info

作者:
福州大学信息管理研究所
福州大学信息管理研究所 基于高校信息碎片化的信息整合构建研究
Author(s):
Research on Information I ntegration Based on University Information Fragmentation
关键词:
随机森林碎片化信息整合特征选择
Keywords:
Random Forest Fragmentation Information Integration Feature Selection
摘要:
随着高校信息化意识的逐步增强,各高校信息化建设水平都取得了很大程度的提升,建成了如人事管理系统、教务管理系统、学工管理系统、科研管理系统、财务管理系统等,为高校的各项业务提供了很大的支持。但随着网络技术的日新月异,大数据时代开始到来,而大数据时代的一个显著特征就是碎片化,也就是说,除了存储在各业务系统中的结构化数据外,伴随高校师生的行为活动,还产生了海量的非结构化数据。因此,有效利用这些海量碎片化信息为高校的人才培养、科学研究、校园活动、绩效评估等提供有效的支持,对当前高校信息化建设来说显得尤为重要。对高校海量碎片化信息进行有效整合并加以利用,不仅能实现多源异构数据的共享,还能充分挖掘其背后的价值,实现知识集成及创新,从而为用户管理决策提供支持与帮助。 为了解决高校海量数据无法得到有效利用的缺陷,本研究首先从高校用户需求出发,分析了知识碎片的概念及高校信息碎片化整合思想,整理了高校信息碎片化整合思路及流程,在此基础上构建了高校信息碎片化整合框架,并详细阐述了高校信息碎片化整合的关键技术。同时,指出高校信息碎片化整合的关键在于对高校信息碎片化整合特征进行有效选择。其次,通过比较各类数据挖掘及机器学习算法的优劣,选择将训练样本训练速度快、分类精度高、抗噪能力强的随机森林算法运用到整合特征选择过程中。通过对随机森林的概念及算法步骤的分析,构建了基于随机森林的整合特征选择模型,并定义了特征选择模型的评价指标用以衡量模型的精度。最后,通过贫困生认定这一案例对随机森林算法在高校信息碎片化整合特征选择中的准确性和有效性进行验证。 研究结果表明高校信息碎片化整合不仅具有很好的扩展性,还充分考虑到用户的自主性,为用户提供了个性化的决策支持。高校信息碎片化整合的核心在于最优整合特征集合的选择。随机森林良好的泛化性和鲁棒性、对噪声不敏感、能处理连续属性的特点,很适合用来建立高校信息整合特征选择模型。本研究利用随机森林算法构建了高校信息碎片化整合的特征选择模型,并通过高校贫困生认定这一实验对模型进行验证。实验结果表明随机森林算法在高校信息整合特征的选择上表现出较高的准确性和有效性,这也为高校信息整合提供了一种新的思路。
Abstract:
With the gradual strengthening of the awareness of information technology in universities, the level of information construction in universities both have made a great degree of improvement, such as personnel management system, educational administration system, academic management system, scientific research management system, financial management system, all of these provides a lot of support for the every business of college. But with the rapid development of network technology, the Age of Big Data is coming, and one of the distinct feature of the Age of Big Data is fragmented, that is to say , in addition to the data of structure which is st ored in the business system,it also produced massive amounts of unstructured data which is a ccompanied by the behavior of teachers and students. Therefore, in order to provide valid support ,the effective use of these massive fragmented information for the university’s personnel training, scientific research, campus activities, performance evaluation is particularly important for the current information construction in universities . It can not only realize the sharing of multi-source and heterogeneous data, but also fully tap the value behind it, realize knowledge integration and innovation, so as to provide support and help for user management decision-making. In order to solve the shortcomings that cannot be effectively utilized th e large amount of data in universities, this study starts from the needs of university users, analyzes the concept of knowledge debris and the idea of information-fragmented integration, i ntegrates the idea and process of fragmented information, constructs the framework of information-fragmented integration and elaborates the key technology of information-fragmented integration. At the same time, it points out that the core of information-fragmented integration in universities lies in the effective selection of the characteristics of information-fragmented integration . Secondly, by comparing the advantages and disadvantages of various types of data mining and machine learning algorithms, the stochastic forest algorithm with high training accuracy, high classification accuracy and strong anti-noise ability is applied to the integration feature selection process. Based on the analysis of random forest concept and algorithm steps, an integrated feature selection model which is based on stochastic forest constructed, and the evaluation index of the feature selection model is defined to measure the accuracy of the model. Finally, this paper validates the accuracy and validity of the stochastic forest algorithm in the selection of fragmented information in universities through the case of affirmation of impoverished students The results show that the integration of fragmented information in college not only has good scalability, but also gives full consideration to the user’s autonomy, and provide users the personalized decision support. The core of information-fragmented integration in universities lies in the choice of optimal integration feature set. The generalized and robustness of random forest is not sensitive to noise and can deal with the characteristics of continuous attributes. It is very suitable for the establishment of the feature selection model of university information integration. In this paper,a stochastic forest algorithm was used to construct the feature selection model of information-fragmented integration in universities, and the model was validated by the case of affirmation of impoverished students. The experimental results show that the random forest algorithm has high accuracy and validity in the selection of information integration features in universities,which also provides a new way for the integration of information in universities

参考文献/References:

[1] 韩民生.基于云计算技术的高校图书馆信息资源平台建设研究[J].中国科教创新导刊,2013(25):252.



[2] 李勇军,彭琳等.大数据治理在高校信息化管理中的探究[J].中国管理信息化,2016(03):185-187.



[3] 马文峰,杜小勇,胡宁.基于信息的资源整合[J].情报资料工作,2007(01):46-50, 70.



[4] Rao S. Integration of complex archeology digital libraries: An ETANA?DL experience [J] . Information Systems, 2008, 33 (7?8):699?723.



[5] Yarrow A, Clubb B. Public Libraries, Archives and Museums: Trends in Collaboration and Cooperation [J]. Collaborative Librarianship, 2013.



[6] Alfredo J S. Organizing open archives via lightweight ontologies to facilitate the use of heterogeneous collections [J] . As lib Proceedings, 2012, 64 (1):46?66.



[7] 郝欣,刘英涛.基于本体集成的数字资源整合研究[J].图书馆学研究,2011(20):55-59.



[8] 赵英,雷强.基于贝叶斯本体映射方法的数字资源整合[J].情报杂志,2008(2):23-27.



[9] 崔伟,徐恺英,王宁.基于知识链的数字资源整合研究[J].图书馆学研究,2010(15):32-35, 10 .



[10] 吕莉媛.基于自组织理论的图书馆数字资源整合研究[J].图书馆学研究,2008(08):55-57, 73 .



[11] 刘胜,陈定权,莫秀娟.基于开放式参考链接的数字资源整合研究[J].图书馆学研究,2008(05):16-20.



[12] 郑燃,唐义,戴艳清.基于关联数据的图书馆、档案馆和博物馆数字资源整合研究[J].图书与情报,2012(1):71-75.



[13] Liston K. Intrusion Detection FAQ: Can you explain traffic analysis and anomaly detection [J]. Politologica Acta Universitatis Palackianae Olomucensis, 2008, 31(6):22.



[14] 周丽琴.高校科技信息资源整合与服务对策研究[J].科技管理研究,2015,5(05):47-51.



[15] 唐振宇,陈凤岩,冯玉强.基于个性化信息服务的大学图书馆信息资源整合[J].情报科学,2008(04):622-626.



[16] 徐琦.基于大数据的高校数据整合模式研究[J].中国教育信息化,2015(15):60-63.



[17] Hongxia W. Urban information integration for advanced planning in Europe [J] . Government Information Quarterly, 2007 (24):736?754.



[18] 杨杰.数据挖掘技术及其应用[M].上海市:上海交通大学出版社,2011.



[19] 吕希艳,张润彤.基于SOA的企业信息资源整合[J].中国科技论坛,2006(6):103-105.



[20] 宋敏.基于SOA图书馆数字资源整合平台关键技术的研究与实现[J]. 数字图书馆,2009(9):22-26.



[21] Bo H,Shan Z Y,Weng T C. Spatio temporal information integration in XML [J]. Future Generation Computer Systems, 2004, 20(7) :1157?1170.



[22] Isabel F C. A visual tool for ontology alignment to enable geospatial interoperability [J]. Journal of Visual Languages and Computing, 2007, 18(3) :230?254.



[23] 张兴华.搜索引擎技术及研究.现代情报[J],2004(4):142-145.



[24] Robert L. Grossman, Yun hong GU , et al. Computer and storage clouds using wide area high performance networks [J] . Future Generation Computer Systems, 2009(25) :179-183.



[25] Garber L. Denial-of-service Attacks Rip the Internet [J] . IEEE Computer, 2007, 33(4) :12-17.



[26] 丛培林. SOA架构在高校信息化系统中整合技术的应用[D]. 成都:电子科技大学, 2011.



[27] 杨小燕,廖清远,等.大数据时代基于云计算的高校信息平台资源整合研究[J].数据库技术,2013,5(4):32-35.



[28] 汪会玲,刘高勇.从面向资源的信息资源整合到面向用户的信息资源整合[J].图书情报工作,2005,49(7):45-48.



[29] 钱庆,李军莲,李丹亚,等.面向用户的自建信息资源整合平台建设[J].医学信息杂志,2009,30(1):9-13.



[30] 吴伯成.基于用户行为的信息资源整合及服务模式探究[J].现代情报,2009(4):51-53.



[31] 申彦舒,姚志宏.基于用户的图书馆信息资源整合[J]. 图书馆学刊,2010(12):34-35.



[32] 王知津,谢丽娜,李赞梅.基于知识管理的政府数字信息资源整合模式构建[J].图书馆,2011(01):27-30.



[33] 唐晓波,赵常记,樊静.面向决策的企业信息资源集成平台研究[J].情报杂志,2009(S2): 99-102 , 93.



[34] 刘新良.对高校信息资源整合的几点思考[J].桂林航天工业高等专科学校学报,2006.11(4): 69-70 , 80.



[35] 陈新添,朱秀珍,李萍.基于知识管理的高校信息资源整合策略[J].现代情报,2007(01):80-82.



[36] 金业阳.高校信息资源整合与服务研究[J].图书馆论坛,2008. 28(3):88-90, 100.



[37] 叶汝军,贾新民,谢一风.浅谈高校信息资源整合[J].中国教育信息化,2009(05):57-58, 62.



[38] 顾瑞,李爱英,卢加元.高校信息资源整合的必要性研究[J].中国教育信息化,2010(05):7-8.



[39] 董华.关于高校信息资源整合的几点思考[J].创新科技,2013(10):61.



[40] 吴延凤,周全明.基于SOA的高校信息资源整合研究[J].福建电脑,2008(11):110, 105.



[41] 姜久雷.基于EMIF的高校信息资源整合技术研究[J].科技信息,2009(33):5-6.



[42] 王平.基于AJAX和Web Services的高校信息资源整合研究[J].伊犁师范学院学报(自然科学版),2009(02):36-39.



[43] 魏华.高校信息资源整合模式初探[J].科技广场,2009(09):239-240.



[44] 陈涛.基于HDFS的云存储在高校信息资源整合中的应用[J].电子设计工程,2012(02):4-6.



[45] 陈方方,何小波.面向服务的高校信息资源整合[J].计算机时代,2015(08):76-77, 80.



[46] 常桐善.数据挖掘技术在美国院校研究中的应用[J]. 复旦教育论坛,2009(02):72-79.



[47] 廖凤露,周庆. EDM用于研究生就业能力的预测[J]. 教育教学论坛,2017(33):65-66.



[48] 施佺,钱源,孙玲. 基于教育数据挖掘的网络学习过程监管研究[J]. 现代教育技术, 2016(06):87-93.



[49] 舒忠梅, 徐晓东. 学习分析视域下的大学生满意度教育数据挖掘及分析[J]. 电化教育研究, 2014(05):39-44.



[50] 何世明,沈军. 基于BP神经网络的网上学习评价方法[J]. 微机发展,2004(12):26-29.



[51] 刘美玲,李熹,李永胜. 数据挖掘技术在高校教学与管理中的应用[J]. 计算机工程与设计, 2010 (05):1130-1133.



[52] Ho T K. Random Decision Forest. In Proceedings of the Third International Conference on Document Analysis and Recognition[C]. Canada: IEEE Computer Society, 1995.



[53] Breiman L . Bagging Predictors [J] . Machine Learning, 1996, 24(2):123-140.



[54] Tin Kam Ho . The Random Subspace Method for Constructing Decision Forests [J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 1998, 20(8) :832-844.



[55] Breiman L . Random Forests [J] . Machine Learning, 2001, 45(1):5-23.



[56] Robnik-Sikonja M . Improving Random Forests. In Proceedings of the 15th European Conference on Machine Learning [C] . Italy: Computer Science, 2004.



[57] Nicolai , Meinshausen. Quantile Regression Forests [J]. Journal of Machine Learning Research, 2006, 7(6):13-14.



[58] Prinzie A, Poel D V D. Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB [M]// Database and Expert Systems Applications. Springer Berlin Heidelberg, 2007:349-358.



[59] Gall J ,Lempitsky V . Class-Specific Hough forests for object detection. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition [C]. Los Alamitos: IEEE Computer Society Press,2009.



[60] Zhou Z H, Tang W. Selective Ensemble of Decision Trees [J]. Lecture Notes in Computer Science, 2003, 2639:476-483.



[61] Smith A, Sterbaboatwright B, Mott J. Novel application of a statistical technique, Random Forests, in a b acterial source tracking study [J]. Water Research, 2010, 44(14):4067-4076.



[62] Qian C, Wang L, Gao Y , et al. In vivo MRI based prostate cancer localization with random forests and auto-context model. Machine Learning in Medical Imaging [J] . Springer International Publishing, 2014, 52:314-322.



[63] Zolbanin H M, Delen D, Zadeh A H. Predicting overall survivability in comorbidity of cancers[M]. Elsevie r Science Publishers B. V. 2015.



[64] 张雷, 王琳琳, 张旭东,等. 随机森林算法基本思想及其在生态学中的应用——以云南松分布模拟为例[J]. 生态学报, 2014, 34(3):650-659.



[65] 李亭, 田原, 邬伦,等. 基于随机森林方法的滑坡灾害危险性区划[J]. 地理与地理信息科学, 2014, 30(6):25-30.



[66] 方匡南,吴见彬.个人住房贷款违约预测与利率政策模拟[J].统计研究,2013,30(10):54-60



[67] 方匡南,吴见彬,谢邦昌.基于随机森林的保险客户利润贡献度研究[J].数理统计与管理,2014,33(6):1122-1131.



[68] 董倩,孙娜娜,李伟.基于网络搜索数据的房地产价格预测[J].统计研究,2014,31(10):81-88.



[69] 李恒贝,等.基于碎片化服务的高校信息化架构及实践[J].中国教育信息化,2016(19):11-13.



[70] 徐国庆. 数据挖掘技术在教育行业CRM中的应用研究[D]. 山东:山东师范大学, 2013.



[71] 张文超. 基于数据挖掘的高校学科建设决策支持系统研究与实现[D]. 北京:北京工业大学, 2013.



[72] 梁世磊. 基于Hadoop平台的随机森林算法研究及图像分类系统实现[D]. 福建:厦门大学, 2014.



[73] 周志华.机器学习[M].北京:清华大学出版社,2016.



[74] Liu Ying-chun, Chen Mei-ling .Random forest method and application in stream big data systems[J].Journal of Northwestern Poly-technical University,2015,33(6):1055-1061.



[75] Yao Deng-ju, Yang Jing, Zhang Xiao- juan.Feature selection algorithm based on random forest [J] . Journal of Jilin University ( Engineering and Technology Edition), 2014, 44 (1):137-141.



[76] Archer K J, Kimes R V. Empirical characterization of random forest variable importance measures [J] . Computational Statistics & Data Analysis, 2008, 52(4) :2249-2260.



[77] 曹正凤. 随机森林算法优化研究[D]. 北京:首都经济贸易大学, 2014.



[78] Janitza S , Strobl C , Boulesteix A L. An AUC-based permutation variable importance measure for random forests [J]. BMC Bioinformatics, 2013, 14(3):433-440.



[79] 董丽娟. 基于关联规则的决策树改进算法在贫困生认定中的应用[D]. 河南:郑州大学, 2016.

更新日期/Last Update: 2019-09-05