发明名称 Methods, apparatus and computer programs for evaluating and using a resilient data representation
摘要 Provided are methods, apparatus and computer programs for evaluating the resilience, to structural changes in a data source, of a representative label representing a data element within the data source. Also disclosed are applications using a resilient representative label. For example, a representative label may represent a particular data field or other data element within a semi-structured data source - such as within XML or HTML Web pages. An estimate of resilience to changes can be used to determine whether a candidate representative label satisfies a required degree of resilience, or to enable selection of a label with the highest resilience score among a set of representative labels. The validated or selected representative label may then be used for data extraction, remaining usable despite the possibility of future changes to the structure of a Web page, or for template clustering/classification.
申请公布号 US2006026157(A1) 申请公布日期 2006.02.02
申请号 US20040880141 申请日期 2004.06.29
申请人 GUPTA RAHUL;JOSHI SACHINDRA;KRISHNAPURAM RAGHURAM 发明人 GUPTA RAHUL;JOSHI SACHINDRA;KRISHNAPURAM RAGHURAM
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址