发明名称 Method for extracting multi-word technical terms from text
摘要 A method and apparatus for extracting multi-word technical terms from a text file in a computer system. Word strings are selected from the text that have at least two words, that have at most a specified maximum number of words, that include none of a special set of selected tokens, and that only include selected characters. Word string which occur less than a specified minimum number of times in the text file are deleted. The remaining strings form a set of word strings very likely to be multi-word technical terms. Improvements on the quality of the set of word strings can be accomplished by deleting word strings which do not satisfy certain grammatical constraints.
申请公布号 US5423032(A) 申请公布日期 1995.06.06
申请号 US19920816908 申请日期 1992.01.03
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 BYRD, ROY J.;JUSTESON, JOHN S.;KATZ, SLAVA M.
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址