发明名称 CATEGORIZING KEYWORDS
摘要 A keyword to be categorized is received. A category dictionary including categories having associated registered keywords, and a text corpus are received. Registered keywords are identified in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and the categories associated with the identified registered keywords are extracted. Registered keywords are identified that are co-occurring in the text corpus with the keyword to be categorized, and the categories associated with the identified co-occurring registered keywords are extracted. A degree of importance is determined for each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords. The extracted categories are outputted, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized.
申请公布号 US2015227620(A1) 申请公布日期 2015.08.13
申请号 US201514609474 申请日期 2015.01.30
申请人 International Business Machines Corporation 发明人 Takeuchi Emiko;Takuma Daisuke;Toyoshima Hirobumi
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for categorizing keywords, the method comprising: receiving, by a computer, a keyword to be categorized; receiving, by the computer, a category dictionary including categories having associated respective pluralities of registered keywords; receiving, by the computer, a text corpus; identifying, by the computer, one or more registered keywords in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and extracting the categories associated with the identified registered keywords; identifying, by the computer, one or more registered keywords co-occurring in the text corpus with the keyword to be categorized, and extracting the categories associated with the identified co-occurring registered keywords; determining, by the computer, a degree of importance of each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords; and outputting, by the computer, the extracted categories, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized.
地址 Armonk NY US