发明名称 INFORMATION RETRIEVAL METHOD UTILIZING WEBPAGE VISUAL AND LANGUAGE FEATURES AND SYSTEM USING THEREOF
摘要 An information retrieval method utilizing webpage visual and language features and a system using thereof are disclosed. The system includes an analysis result database, a webpage template database, a webpage collecting module, and an analyzing module. The webpage template database stores template feature arrays of respective target websites.;Each of the template feature arrays includes one or more template visual feature and one or more template language feature which are corresponding to template nodes of a DOM tree. The system is linked to a target website by the webpage collecting module, so as to retrieve webpage feature arrays of a target webpage of the target website. The system calculates an overall similarity between the webpage feature arrays and the template feature arrays corresponding to the same target website. Consequently, a desired information content can be determined and stored in the analysis result database.
申请公布号 US2017024472(A1) 申请公布日期 2017.01.26
申请号 US201514860984 申请日期 2015.09.22
申请人 GREEN PRESTIGE PTE. LTD. 发明人 Peng Ting-Chun
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. An information retrieval system utilizing webpage visual and language features, comprising: an analysis result database; a webpage template database for storing at least one template feature array of at least one target website, the template feature array include at least one visual feature and at least one language feature of a template node in the document object model (DOM) data structure; a webpage collecting module linking with at least one target website, to retrieve at least one visual feature and at least one language feature from at least one target webpage node of at least one target webpage of the target website in forming a corresponding webpage feature array; and an analyzing module to calculate an overall similarity between the webpage feature array and the template feature array for the same target website, if the overall similarity being greater than a threshold value, the analysis result database stores the contents of the corresponding target webpage node.
地址 Singapore SG
您可能感兴趣的专利