发明名称 Automatic wrapper grammar generation
摘要 <p>A method for generating a wrapper grammar for a file having a structure of a particular format includes providing (50) at least one sample file of the particular format, where the particular format comprises a plurality of string tokens. Each sample file includes a plurality of tokens (data strings) which may be actual data from the document, an HTML tag or some other grammatical separator. The sample file of the particular format is then processed by annotating attributable tokens (52) with a user-defined attribute, such as Author, Title, etc. from a set of attributes to form an annotated sample set. The annotated sample set is then evaluated (54) to determine if wrapper grammar generation is possible, and if it is possible, a wrapper grammar for the files having a structure of the particular format is generated (70). Preferably, the annotated sample set is evaluated by determining if all attributes in the annotated sample set are distinguishable from one another. &lt;IMAGE&gt;</p>
申请公布号 EP1072985(A2) 申请公布日期 2001.01.31
申请号 EP20000306268 申请日期 2000.07.24
申请人 XEROX CORPORATION 发明人 CHIDLOVSKII, BORIS
分类号 G06F9/44;G06F12/00;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F9/44
代理机构 代理人
主权项
地址