发明名称 System for categorizing character strings using acceptability and category information contained in ending substrings
摘要 A data storage medium stores string data that can be used in character recognition and instructions for accessing the string data. The string data includes data units that can be accessed by a processor in executing the instructions. The processor can use character data indicating characters of a string to access a sequence of the data units that ends with an ending subsequence. The ending subsequence includes acceptance information indicating whether a string whose sequence of data units ends with the ending subsequence is an acceptable string. If so, the ending subsequence also includes category set information indicating a set of categories for strings whose sequences end with the ending subsequence. The categories can include words, numbers, compound words, and so forth. The acceptance information can include a bit in a character label data unit that includes information indicating the character type of an ending character. The acceptance information can also include an acceptance data unit whose value indicates an acceptable string ending. The acceptance data unit can be followed by category data units, each with a value indicating a category. The category data units can be used to obtain a bit vector for a string, each bit of which indicates whether the string is in one of the categories. For compactness, all or part of an ending subsequence can be shared by plural acceptable strings. Looping can be used to represent a category with a potentially infinite number of strings, such as numbers.
申请公布号 US5488719(A) 申请公布日期 1996.01.30
申请号 US19910814552 申请日期 1991.12.30
申请人 XEROX CORPORATION 发明人 KAPLAN, RONALD M.;SHUCHATOWITZ, ROBERT;MULLINS, ATTY T.
分类号 G06K9/68;(IPC1-7):G06F17/30 主分类号 G06K9/68
代理机构 代理人
主权项
地址