摘要 |
Disclosed is a method of indexing a database of documents, comprising providing a vocabulary of n terms, indexing the database in the form of a non-negative nxm index matrix V, wherein each of its m columns represents an jth document having n entries containing a function of the number of occurrences of a ith term of said vocabulary appearing in said jth document, factoring out non-negative matrix factors T and D such that V≈TD, and wherein T is an nxr term matrix, D is an rxm document matrix, and r<nm/(n+m). The index so generated is useful in two-pass information retrieval systems.
|