发明名称 Learning data prototypes for information extraction
摘要 A method for determining statistically significant token sequences lends itself for use in the recognition of broken wrappers as well as the construction of new wrapper rules. When new wrapper rules are needed as the underlying wrapped data has changed, training examples are used to recognized data rule candidates that are culled with a bias for rule candidates that would be probably more successful. The resulting rule candidate set is clustered according to feature characteristics, then compared to the training examples. Those rule candidates most similar to the training examples are used to create new wrapper rules.
申请公布号 US6714941(B1) 申请公布日期 2004.03.30
申请号 US20000620062 申请日期 2000.07.19
申请人 UNIVERSITY OF SOUTHERN CALIFORNIA 发明人 LERMAN KRISTINA;MINTON STEVEN
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利