摘要 |
A keyword to be categorized is received. A category dictionary including categories having associated registered keywords, and a text corpus are received. Registered keywords are identified in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and the categories associated with the identified registered keywords are extracted. Registered keywords are identified that are co-occurring in the text corpus with the keyword to be categorized, and the categories associated with the identified co-occurring registered keywords are extracted. A degree of importance is determined for each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords. The extracted categories are outputted, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized. |
主权项 |
1. A method for categorizing keywords, the method comprising:
receiving, by a computer, a keyword to be categorized; receiving, by the computer, a category dictionary including categories having associated respective pluralities of registered keywords; receiving, by the computer, a text corpus; identifying, by the computer, one or more registered keywords in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and extracting the categories associated with the identified registered keywords; identifying, by the computer, one or more registered keywords co-occurring in the text corpus with the keyword to be categorized, and extracting the categories associated with the identified co-occurring registered keywords; determining, by the computer, a degree of importance of each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords; and outputting, by the computer, the extracted categories, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized. |