摘要 |
PROBLEM TO BE SOLVED: To appropriately presume the top page of a website, and to perform information retrieval by website unit suited to a retrieval purpose starting from the top page. SOLUTION: Every server name to which each page of a Web page set belongs is extracted (S1), a URL, a server name, a directory layer and meta- information of each page are extracted (S2), using a page classification tree of each page, a classification likelihood for the page type is extracted (S3), for each server, a page which the directory layer thereof is 0 and has a file name located in the layer is presumed as the top page (S4), if the top page is not presumed, a directory layer in which a top page exists with a top page type classification likelihood is determined (S5), a page, in the directory layer, for which a file exists in a lower layer and the sum of classification likelihood to the page type is maximum is determined as the top page for every directory layer (S6), and if the top page is absent, a page for which the top page classification likelihood is equal to or more than a threshold in a layer one level lower than the layer is determined as the top page (S7). COPYRIGHT: (C)2003,JPO
|