摘要 |
A document search and retrieval system and method stores documents in groups based on content. The documents are self-organized into a hierarchy of conceptual clusters, and branches of the hierarchy are stored separately in distinct physical stores, each having an index. In response to a query, the system finds the concepts (clusters) that best match the search criteria and returns the documents from those content categories. The indexing, clustering, and searching are performed using document themes and/or summaries. Themes are automatically developed by stemming and scoring phrases from the sentences in each document, and clustering the sentences containing the highest-scoring stems. A set of phrases (themes) is taken from each cluster. Document summaries are taken from text segments for each cluster of sentences within a document, then strung together to create a summary.
|