摘要 |
PURPOSE:To shorten the generation time of a dictionary, by finding plural code value ranges on the basis of the number of categories from a code value appearance frequency distribution obtained by encoding feature values of macrofeatures, and generating divided dictionaries corresponding to the code value ranges. CONSTITUTION:When the encoding of characters on a character sample slip is completed, a dictionary generation part 9 uses a code value string in a code storage part 8 to generate the code value appearance frequency distribution of each category based upon macrofeatures H, and the plural code value ranges are determined, i.e. divided from the frequency distributions so that the number of categories within the range of code values is reduced. Then, a next divided dictionary is generated for such every code value string in the code storage part 8 that code values corresponding to the features H are within said code value range, and stored in a dictionary part. While a string of code values in the same category does not contain that in another category, code values are combined for every feature and a lower-limit and a upper-limit code are found to obtain the range of the code values; and dictionary elements of the divided dictionaries are realized by category names and ranges of code values of respective features. |