发明名称 HIERARCHY EXTRACTION FROM THE WEBSITES
摘要 The present invention provides methods and systems for building object hierarchy. The method includes: obtaining a set of web pages from a website; conducting an inter-page analysis on the obtained web pages to extract a hierarchy of the web pages; conducting an intra-page analysis on each of the obtained web pages to identify the semantic blocks within the web page and extract a hierarchy of the semantic blocks for all the web pages; and fusing the hierarchy of the semantic blocks with the hierarchy of the web pages to generate a coordinated hierarchy. In one embodiment, the nodes on the generated coordinated hierarchy are then mapped into corresponding objects to generate the coordinated object hierarchy. Compared with the prior arts, the object hierarchy building systems and methods according to the present invention can build the object hierarchy in a more accurate and efficient way by fusing the inter-page analysis result and the intra-page analysis result.
申请公布号 US2009327338(A1) 申请公布日期 2009.12.31
申请号 US20090491573 申请日期 2009.06.25
申请人 NEC (CHINA) CO., LTD. 发明人 ZHAO YU;LI JIANQIANG
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址