摘要 |
Documents are assigned to one or more indexes in a document indexing system on the basis of document properties such as total number of tokens in the document, number of numeric tokens in the document, number of alphabetic tokens in the document, size of the document, and metadata associated with the document. Based on statistical distributions of document properties (over a large number of documents), different indexes can be defined, and a document router can direct a particular document to one index or another based on the properties of the particular document. In some implementations, certain document properties may be used to identify a nonrelevant document, or garbage document, so that it is either not indexed or assigned to an index dedicated for such documents.
|