主权项 |
1. A system comprising:
one or more servers, each server having a processor and a memory, the one or more servers comprising:
a collector module configured to acquire a plurality of eigenvectors, each having a corresponding eigenvalue, wherein
the plurality of eigenvectors are based on a plurality of tokenized electronic documents having unstructured text, the plurality of tokenized electronic documents forming a data matrix, andthe unstructured text includes background terms and nonbackground terms; anda dimensional reduction module configured to:
classify the plurality of eigenvectors and their corresponding eigenvalues into one or more background eigenvectors and background eigenvalues, and one or more nonbackground eigenvectors and nonbackground eigenvalues, wherein the background eigenvectors correspond to the background terms and the nonbackground eigenvectors correspond to nonbackground terms,acquire a threshold,compare the nonbackground eigenvalues with the threshold, andprovide the nonbackground eigenvectors whose corresponding nonbackground eigenvalues exceed the threshold, wherein the provided nonbackground eigenvectors are used for clustering the plurality of documents.
|