摘要 |
A document preprocessor preprocess a document to enhance the statistical features of the document. The system preprocesses the document by matching a prefix and a trailing context in the document with one or more matching prefixes in a transformation database, where the prefix is a first string of one or more tokens in the first document and the trailing context is a second string of one or more tokens in the first document that trail the prefix. Alternatively, the system preprocesses the document by computing cyclic permutations of the document, sorting these permutations and taking the last token from each of the sorted permutations.
|