发明名称 |
Method for extracting name entities and jargon terms using a suffix tree data structure |
摘要 |
A method for entity name and jargon term recognition and extraction. An embodiment of the present invention uses a suffix tree data structure to determine frequently occurring phrases. In one embodiment text to be analyzed is preprocessed. The text is then separated into clauses and a suffix tree is created for the text. The suffix tree is used to determine repetitious segments. Unrecognized text fragment, occurring with a high frequency, have a comparably high probability of being a name entity or jargon term. The set of repetitious segments is then filtered to obtain a set of possible entity names and jargon terms.
|
申请公布号 |
US2003083862(A1) |
申请公布日期 |
2003.05.01 |
申请号 |
US20010017408 |
申请日期 |
2001.10.30 |
申请人 |
HU ZENGJIAN;ZHANG YIMIN;ZHOU JOE F. |
发明人 |
HU ZENGJIAN;ZHANG YIMIN;ZHOU JOE F. |
分类号 |
G06F17/27;(IPC1-7):G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|