发明名称 Topic indexing method
摘要 A method for improving the associating articles of information or stories with topics associated with specific subjects (subject topics) and with a general topic of words that are not associated with any subject. The inventive method is trained using Hidden Markov Models (HMM) to represent each story with each state in the HMM representing each topic. A standard Expectation and Maximization algorithm, as are known in this art field can be used to maximize the expected likelihood to the method relating the words associated with each topic to that topic. In the method, the probability that each word in a story is related to a subject topic is determined and evaluated, and the subject topics with the lowest probability are discarded. The remaining subject topics are evaluated and a sub-set of subject topics with the highest probabilities over all the words in a story are considered to be the "correct" subject topic set. The method utilizes only the positive information and words related to other topics are not taken as negative evidence for a topic being evaluated. The technique has particular application to text that is derived from speech via a speech recognizer or any other techniques that results in a text file. The use of a general topic category enhances the results since most words in any story are not keywords that are associated with any given subject topic. The removal of the general words reduces the numbers of words being considered as keywords related to any given subject topic. The reduced number of words being processed allows the method to enhance the discrimination between the fewer words as related to the subject topics. The topics can range from general, for example "the U.S. economy", to very specific, for example, "the relationship of the yen to the dollar in the U.S. economy."
申请公布号 US6185531(B1) 申请公布日期 2001.02.06
申请号 US19980005960 申请日期 1998.01.09
申请人 GTE INTERNETWORKING INCORPORATED 发明人 SCHWARTZ RICHARD M.;IMAI TORU
分类号 G06F17/27;G10L15/18;(IPC1-7):G10L5/06;G10L9/00 主分类号 G06F17/27
代理机构 代理人
主权项
地址