摘要 |
Apparatus and method for evaluating the likelihood of an event (such as a word) following a string of known events, based on event sequence counts derived from sparse sample data. Event sequences -- or m-grams -- include a key and a subsequent event. For each m-gram which was counted in the sample data, there is stored a discounted probability @ generated by applying a modified Turing's estimate, for example, to a count-based probability. For a key occurring in the sample data there is stored a normalization constant alpha which (a) adjusts the discounted probabilities for multiple counting, if any, and (b) includes a freed probability mass allocated to m-grams which do not occur in the sample data. To determine the likelihood of a selected event following a string of known events, a "backing off" scheme is employed in which successively shorter included keys (of known events) followed by the selected event (representing m-grams) are searched (302, 308) until an m-gram is found having a discounted probability stored therefor. The normalization constants (306, 312) of the longer searched keys -- for which the corresponding m-grams have no stored discounted probability -- are combined together with the found discounted probability to produce (304, 310, 314) the likelihood of the selected event being next. |