发明名称 CREATING A TERMS DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA
摘要 A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.
申请公布号 US2010174528(A1) 申请公布日期 2010.07.08
申请号 US20100651509 申请日期 2010.01.04
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 OYA HIROKI;TAKUMA DAISUKE;TOYOSHIMA HIROBUMI
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址
您可能感兴趣的专利