发明名称 Method for automatic wrapper repair
摘要 A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information, includes using the initial set of rules to extract strings from the Web page parsed in forward direction; analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper; assigning labels to those strings which satisfy the label rules; using the initial set of rules to extract strings from the Web page in backward/(opposite) direction; analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and assigning labels to those unlabeled strings from which satisfy the label rules.
申请公布号 US7440974(B2) 申请公布日期 2008.10.21
申请号 US20050295367 申请日期 2005.12.05
申请人 XEROX CORPORATION 发明人 CHIDLOVSKII BORIS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址