摘要 |
PROBLEM TO BE SOLVED: To refer to semistructured information at web sites in various parts to structure it. SOLUTION: This method comprises the steps of: identifying patterns of interest by examining the semistructured information including text information using lexical analysis for repetitive patterns, cataloging the patterns by name and position in a nested structure, examining patterns in the nested structure to identify attributes that correspond to fields of a relational schema of a relational database (S306); examining the patterns in the nested structure to identify the patterns, decomposing the patterns to catalog them in the nested structure, examining the patterns in the nested structure to identify links to other semistructured information (S308); and cataloging the patterns of interest in the nested structure, repeating the above steps until all of the nested information is cataloged to obtain definition including regular expressions of the semistructured information so that it may be utilized by a dedicated program translator. COPYRIGHT: (C)2008,JPO&INPIT
|