发明名称 Document type definition generating method and apparatus
摘要 <p>There is disclosed a document type definition generating method comprising, in a structured document provided with a tag having an element name in each document element, judging a physical structure of each document element from indention, blank lines, and positional relation between tags, analyzing words and phrases in each document element, and judging a semantic structure of the document element based on words and phrases connection and word types. When the physical and semantic structures of document elements having tags different in element name are similar, the elements are regarded as being of the same type and one element name is excluded from a list for generating the document type definition. When the physical and semantic structures of document elements having tags with the same element name are different, the elements are regarded as being of the different types and one element name is changed. Furthermore, the words and phrases between a start tag and an end tag with the same title are analyzed, and the information to be included between the tags is obtained to generate the document type definition. Thereby, tag meaning is correctly treated, and the document type definition with tag redundancy removed therefrom is generated. &lt;IMAGE&gt; &lt;IMAGE&gt;</p>
申请公布号 EP1004968(A2) 申请公布日期 2000.05.31
申请号 EP19990309415 申请日期 1999.11.25
申请人 CANON KABUSHIKI KAISHA 发明人 MIZUNO, TAKAFUMI
分类号 G06F17/22;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/22
代理机构 代理人
主权项
地址