发明名称 |
Method and apparatus for generating a language independent document abstract |
摘要 |
A method of extracting significant phrases from one or more documents stored in a computer readable medium. A sequence of words is read from the one or more documents and a score is determined for each word in the sequence based on the length of the word. The score for each word in the sequence is compared against a threshold score. The sequence of words is indicated to be a significant phrase if the number of words in the sequences that have a score greater than the threshold score equals or exceeds a predetermined number. A sentence containing the sequence of words is retrieved from the document, if the sequence of words is a significant phrase. An abstract of the document is searched to determine if the sentence has been previously included in the abstract. If not, the sentence is added to the abstract.
|
申请公布号 |
US2005119873(A1) |
申请公布日期 |
2005.06.02 |
申请号 |
US20040018045 |
申请日期 |
2004.12.21 |
申请人 |
INTESOFT SYSTEMS LLC |
发明人 |
CHANEY GARNET R.;RICHARDSON ROBERT F.;RUBINSTEIN SEYMOUR I. |
分类号 |
G06F17/30;(IPC1-7):G06F17/20 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|