发明名称 |
Method for extracting multi-word technical terms from text |
摘要 |
A method and apparatus for extracting multi-word technical terms from a text file in a computer system. Word strings are selected from the text that have at least two words, that have at most a specified maximum number of words, that include none of a special set of selected tokens, and that only include selected characters. Word string which occur less than a specified minimum number of times in the text file are deleted. The remaining strings form a set of word strings very likely to be multi-word technical terms. Improvements on the quality of the set of word strings can be accomplished by deleting word strings which do not satisfy certain grammatical constraints.
|
申请公布号 |
US5423032(A) |
申请公布日期 |
1995.06.06 |
申请号 |
US19920816908 |
申请日期 |
1992.01.03 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
BYRD, ROY J.;JUSTESON, JOHN S.;KATZ, SLAVA M. |
分类号 |
G06F17/30;(IPC1-7):G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|