发明名称 System and method for interpreting document contents
摘要 A sequence of word filters are used to eliminate terms in the database which do not discriminate document content, resulting in a filtered word set and a topic word set whose members are highly predictive of content. These two word sets are then formed into a two dimensional matrix with matrix entries calculated as the conditional probability that a document will contain a word in a row given that it contains the word in a column. The matrix representation allows the resultant vectors to be utilized to interpret document contents.
申请公布号 US6772170(B2) 申请公布日期 2004.08.03
申请号 US20020298361 申请日期 2002.11.16
申请人 BATTELLE MEMORIAL INSTITUTE 发明人 PENNOCK KELLY A.;MILLER NANCY E.
分类号 G06F17/30;(IPC1-7):G06F17/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址