RECOGNITION SYSTEM AND RECOGNITION METHOD OF NON-BODY TEXT IN WEBPAGE,申请号WO2013CN77102-传众专利搜索

发明名称	RECOGNITION SYSTEM AND RECOGNITION METHOD OF NON-BODY TEXT IN WEBPAGE
摘要	<p>Disclosed are a recognition system and a recognition method of the non-body text in a webpage, which relate to the field of text extraction. The system comprises: a webpage grabber, used for grabbing webpage data a target website; DOM tree construction unit, used for constructing DOM trees that each webpage in the target website corresponds to; a DOM tree analysis unit, used for finding a unit text section in the webpage according to the DOM trees; a text statistic unit; used for conducting statistics on occurrence number in the webpage of a target website of the unit text section; a text recognition unit, used for recognizing the unit text section into a non-body text when the occurrence times are larger than a preset threshold value. The system and the method overcome the problem of recognition lag of the non-body texts in an existing method and have high recognition accuracy.</p>
申请公布号	WO2014000571(A1)	申请公布日期	2014.01.03
申请号	WO2013CN77102	申请日期	2013.06.09
申请人	BEIJING QIHOO TECHNOLOGY COMPANY LIMITED;QIZHI SOFTWARE (BEIJING) COMPANY LIMITED	发明人	WANG, ZHIGANG
分类号	G06F17/30	主分类号	G06F17/30
代理机构		代理人
主权项
地址