摘要 |
A method for extracting subjects and sorting documents in a search engine, and a computer-readable recording medium storing a program thereof are provided to enable a user to access desired information conveniently/quickly by selecting atypical/various subjects not classified in a manual mode, classify the target documents into each subject, and determine whether the searched document is suitable for the subject. A relation degree representing that respective keywords are selected at the same time is measured for the keywords included in target documents. A convergence relation degree between a word set about the predetermined keyword and the word set related to other keywords is measured. The keyword is selected as a subject when the convergence relation degree is higher than a specific value. A naive Bayesian probability is calculated by performing naive Bayesian training for training documents and each keyword included in the target documents. A vector size of each keyword included in the training and target document is calculated. A distance between the vector size of each keyword of the training and target document is calculated. Similarity of each keyword is calculated by multiplying the naive Bayesian probability and the distance. A ranking value is calculated by processing the similarity of each keyword included in the target document. |