[1]吴伟斌.基于OCR技术的网页篡改检测研究[J].信息化理论与实践,2021,(01):39-43.
 Wu Weibin.Research on web page tampering detection based on OCR technology[J].Information Theory and Practice,2021,(01):39-43.
点击复制

基于OCR技术的网页篡改检测研究()
分享到:

《信息化理论与实践》[ISSN:2520-5862/CN:]

卷:
期数:
2021年01
页码:
39-43
栏目:
出版日期:
2021-12-31

文章信息/Info

Title:
Research on web page tampering detection based on OCR technology
作者:
吴伟斌
(泉州师范学院网络中心,福建泉州 362000)
Author(s):
Wu Weibin
(Network Center of Quanzhou Normal University, Quanzhou 362000, Fujian)
关键词:
OCR网页篡改检测场景文字识别
Keywords:
OCR webpage tampering detection scene text recognition
摘要:
目的]当前网站受到各种安全的威胁态势并未减弱,网站被篡改事件还在增长,除了加强防范外,网页被篡改检测也是重要一环。[方法]本文提出基于OCR(Optical Character Recognition)技术的网页篡改检测模型,利用自然场景文字识别技术对网页截图进行文字识字,经敏感词检测判断网页是否被篡改。[结果]经对网页篡改检测模型进行实验,该模型在实验的数据集上准确率较高。[局限]模型也存在不足之处:1)无法识别隐式篡改;2)依赖敏感词,文字之外其他异常信息尚待研究。[结论]基于OCR技术的可用于网页文字篡改的检测。
Abstract:
Objective]At present, the threat situation of website security has not weakened, and the number of website tampering incidents is still increasing. In addition to strengthening prevention, the detection of webpage tampering is also an important part.[Methods]This paper proposes a web page tamper detection model based on OCR (Optical Character Recognition) technology. I t uses scene text recognition technology to recognize the characters of web page screenshots, and judges whether the web page has been tampered by sensitive word detection. [Results]Through experiments on the web page tampering detection model, the model has high accuracy on the experimental data set.[Limitations] there are also deficiencies in the model: 1) it is impossible to identify implicit tampering; 2) relying on sensitive words and other abnormal information outside the text remains to be studied. [Conclusion] the method based on OCR technology can be used to detect the tampering of web pages

参考文献/References:

[1]中国互联网络信息中心.第45次中国互联网络发展状况统计报告(2020-04-28)[EB/OL], [2020-08-08], http://www.cac.gov.cn/2020-04/27/c_1589535470378587.htm .

[2]盖玲.防网页篡改技术比较分析[J].图书与情报.2007(01)

[3]韩钢.国家互联网应急中心图像处理技术在网页篡改识别上的应用[J].通信管理与技术.2017(03)

[4]颜于凤,沈勇.基于图像处理的网页篡改检测[J].计算机与数字工程.2020(6):1479-1482, 1518

[5]王闻祎.网页篡改检测系统设计与实现[D].西南交通大学 2019

[6]牛小明,毕可骏,唐军.图文识别技术综述[J].中国体视学与图像分析2019(24):241-256

[7]SIGAI.自然场景文本检测识别技术综述(2018-06-30)[EB/OL],[2020-08-08], https://blog.csdn.net/ SIGAICSDN/article/details/80858565

[8]Paddle,PaddleOCR(2020-08-08)[EB/OL][2020-08-08].https://github.com/PaddlePaddle/PaddleOCR

[9]白翔,杨明锟,石葆光,等.基于深度学习的场景文字检测与识别[J].中国科学:信息科学,2018,48(5):531-544.

[10]Liao Minghui, Wan Zhaoyi, Yao Cong, et al. Real-time Scene Text Detection with Differentiable Binarization[C].National Conference on Artificial Intelligence,

2020.

[11]Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE transactions on pattern analysis and machine intelligence,2016,39(11):2298-2304

[12]墨殇浅尘.场景文本检测(Differentiable Binarization)-DB (2020-07-03)[EB/OL], [2020-08-08], https://www.cnblogs.com/monologuesmw/p/13223314.html

[13]Alex Graves,Jürgen Schmidhuber.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005 (5)

[14]Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks[C]. Proceedings of the 23rd international conference on Machine learning.New York:ACM,2006: 369-376.

[15]魏文晗,邓一贵.基于局部变化性的网页篡改识别模型及方法[J].计算机应用.2013,33(2):430-433

更新日期/Last Update: 2022-12-20