发明名称 |
System and method for interpreting document contents |
摘要 |
A sequence of word filters are used to eliminate terms in the database which do not discriminate document content, resulting in a filtered word set and a topic word set whose members are highly predictive of content. These two word sets are then formed into a two dimensional matrix with matrix entries calculated as the conditional probability that a document will contain a word in a row given that it contains the word in a column. The matrix representation allows the resultant vectors to be utilized to interpret document contents.
|
申请公布号 |
US6772170(B2) |
申请公布日期 |
2004.08.03 |
申请号 |
US20020298361 |
申请日期 |
2002.11.16 |
申请人 |
BATTELLE MEMORIAL INSTITUTE |
发明人 |
PENNOCK KELLY A.;MILLER NANCY E. |
分类号 |
G06F17/30;(IPC1-7):G06F17/00 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|