发明名称 XML: finding authoritative pages for mining communities based on page structure criteria
摘要 A method of determining well-formed web pages which are authorities on a given topic utilizing link analysis. A root set of pages is first obtained by taking a given number of the highest ranked pages returned form a textual based searching and ranking system. Each page within the set is evaluated and given a structure score which reflects how well-formed the page is. The structure score is determined by evaluating each page within the set according to a set of parameters which relate to well-formed pages. For each parameter, the page is assigned a parameter score. These parameter scores are then weighted and summed to obtain the pages structure score. Each page within the set also has corresponding hub and authority weights which are updated and maintained to determine the strongest authorities. The initial hub and authority weights of a each page are set to the corresponding structure score of the page. An iterative algorithm is then utilized to determine the strongest authorities. For each round of the algorithm, the authority weights of a page are updated by summing the hub weights of each page pointing to the page, while the hub weights of a page are updated by summing the authority weights of each page which is pointed to by the page whose hub weight is being determined. After a series of iterations, the pages having the highest authority weights are identified as the strongest authorities, with the best structure, on the query topic.
申请公布号 US2002169800(A1) 申请公布日期 2002.11.14
申请号 US20010754257 申请日期 2001.01.05
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 SUNDARESAN NEELAKANTAN;HUANG ANITA WAI-LING
分类号 G06F17/30;(IPC1-7):G06F7/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址