发明名称 Determination of member pages for a hyperlinked document with recursive page-level link analysis
摘要 The present invention relates to a methodology for assembling a document from content spanning multiple web-pages. Given a starting location, one process analyzes a single page at a time to find candidate links. The links are recursively followed and those pages are analyzed. A detailed set of heuristics is used to determine what is or is not a candidate link. The candidate pages are then optionally fed to a document-level analyzer. This process compares the attributes of one page against the others and looks for a document-like structure. Using another detailed set of heuristics, the document-level analyzer determines if the page should be included in the document.
申请公布号 US2004237037(A1) 申请公布日期 2004.11.25
申请号 US20030608587 申请日期 2003.06.27
申请人 XEROX CORPORATION 发明人 SWEET JAMES M.;HARRINGTON STEVEN J.;JONES RHYS PRICE;SAVAKIS ANDREAS
分类号 G06F15/00;G06F17/30;(IPC1-7):G06F15/00 主分类号 G06F15/00
代理机构 代理人
主权项
地址