发明名称 METHOD OF COMPRESSING INFORMATION AND AN APPARATUS FOR COMPRESSING ENGLISH TEXT
摘要 There is described a method of storing alphanumeric text in a random access memory or disk file where the text is received as a string of ASCII coded characters that are then separated into tokens which may be either groups of alpha characters, numeric characters or punctuation characters. Each alpha token is encoded by comparing the token with a table of words stored in a global dictionary. If present in the dictionary, the token is stored in the memory as two or three four-bit nibbles which identify its location in the dictionary. If not in the dictionary, the characters in the front of the token are compared with a list of word beginnings or prefixes. If a match is found, two nibbles are stored in the random access memory identifying the prefix, the prefix characters are stripped from the token, and the process is repeated. If no more prefixes are found, the end of the word is matched against a stored group of word endings or suffixes. Two nibbles are stored in the random access memory identifying the suffix, and the suffix characters are stripped from the token. The suffix matching is repeated on the remaining ending characters of the token. If there are no more identifiable suffixes, the number of characters remaining in the stem of the token is determined. After all letters are removed by identifying all suffixes, the remaining stem is encoded as one or two nibbles for each letter plus a nibble identifying the length of the stem. If no suffix was identified, a nibble is stored indicating that the stem is either four or five characters in length, or the actual length of the stem is stored as an additional nibble. After the stem length has been encoded, the individual letters of the stem are encoded by one or two nibbles for each letter as determined from tables of individual characters.
申请公布号 DE3277556(D1) 申请公布日期 1987.12.03
申请号 DE19823277556 申请日期 1982.09.25
申请人 SYSTEM DEVELOPMENT CORPORATION 发明人 SNOW, CRAIG ADAM
分类号 H04L23/00;G06F5/00;G06F17/22;G06F17/27;H03M7/30;H03M7/42;(IPC1-7):G06F15/20 主分类号 H04L23/00
代理机构 代理人
主权项
地址