发明名称 Apparatus, method, and program that performs syntax parsing on a structured document in the form of electronic data
摘要 Statistical information about instance documents and schema information are used to integrate multiple state transitions that enable sectioning of a structure document, thereby generating an optimum automaton. In integrating state transitions, consecutively matching state transitions are held in the form of an ID list, which is then used to count the number of consecutive state transitions. Furthermore, patterns in the number of occurrences of repetitive elements including nested elements are statistically obtained. Variations of blanks in XML are addressed by using a statistical method. Schema information is used to build an automaton beforehand, thereby initialization overhead of the syntax parsing apparatus is reduced.
申请公布号 US8181105(B2) 申请公布日期 2012.05.15
申请号 US20080061747 申请日期 2008.04.03
申请人 SUZUMURA TOYOTARO;TATSUBORI MICHIAKI;URAMOTO NAOHIKO;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 SUZUMURA TOYOTARO;TATSUBORI MICHIAKI;URAMOTO NAOHIKO
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址