发明名称 |
CREATING A TERMS DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA |
摘要 |
A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.
|
申请公布号 |
US2010174528(A1) |
申请公布日期 |
2010.07.08 |
申请号 |
US20100651509 |
申请日期 |
2010.01.04 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
OYA HIROKI;TAKUMA DAISUKE;TOYOSHIMA HIROBUMI |
分类号 |
G06F17/21 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|