发明名称 Method for automatic wrapper repair
摘要 A method for repairing a wrapper associated with an information source, includes defining a classifier, based on content features of extracted and labeled information using the wrapper, using the classifier to extract content information from the file according to a set of classifier extraction rules; analyzing the extracted content information according to the content features and assigning a label to any extracted content information which satisfies the label's rules; and defining a repaired wrapper as the classifier and those labels in the set which have been assigned to extracted content information. Additional content information and labels can be extracted by iteratively creating a classifier based on both content features and structure features of extracted strings.
申请公布号 US2006074998(A1) 申请公布日期 2006.04.06
申请号 US20050294869 申请日期 2005.12.05
申请人 XEROX CORPORATION 发明人 CHIDLOVSKII BORIS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址