发明名称 Parser generation based on example document
摘要 The process generates a parser to extract records from a set of documents. The process operates on a sample document from the set. The sample document is an XML document or is converted to an XML document. Simple Xpaths of the XML document are identified. Complex extensions of the simple Xpath are clustered according to common substructures. The complex Xpath clusters are scored according to content in instances or differences in content among instances. Candidate parsers are created. Each candidate consists of a single record Xpath and one or more field value Xpaths that are descendents of the record Xpath. The candidate parsers are ranked using the Xpath scores.
申请公布号 US2003221169(A1) 申请公布日期 2003.11.27
申请号 US20030365747 申请日期 2003.02.14
申请人 SWETT IAN DOUGLAS 发明人 SWETT IAN DOUGLAS
分类号 G06F9/45;(IPC1-7):G06F9/45 主分类号 G06F9/45
代理机构 代理人
主权项
地址