发明名称 METHOD AND DEVICE FOR DOCUMENT STRUCTURE RECOGNITION AND STORAGE MEDIUM STORING DOCUMENT STRUCTURE RECOGNIZING PROGRAM
摘要 PROBLEM TO BE SOLVED: To provide a method and device for document structure recognition which can analyze the structure of a sentence including arbitrary itemized articles having no dependency on fields by paying attention to a specific character string pattern in a document and the length from the head to tail of the line including the pattern and considering the corelation between the length of the whole line and the length of the specific character pattern and to provide a storage medium where the document structure recognizing program is stored. SOLUTION: A document to be recognized is inputted and pattern matching with itemized article patterns which have been previously stored is carried out for each line (S2) to generate candidates for itemized articles matching the itemized article patterns, and the length of the character string from the head to tail of a single line of the document where characters are present is measured (S3). When the generate itemized article candidates include blanks, one itemized article candidate is selected from the obtained article candidates (S4) by using the length of the character string, the blanks are deleted from the determined candidate to obtain information on an article label and contents (S5), and information on the contents of the label is imparted as a tag to the determined itemized article and the article is outputted.
申请公布号 JP2000148752(A) 申请公布日期 2000.05.30
申请号 JP19980317948 申请日期 1998.11.09
申请人 NIPPON TELEGR & TELEPH CORP <NTT> 发明人 HASEGAWA TAKAAKI;TAKAGI SHINICHIRO
分类号 G06F17/21;G06F17/27;G06F17/30 主分类号 G06F17/21
代理机构 代理人
主权项
地址