发明名称 Topic extraction apparatus and program
摘要 According to one embodiment, a topic extracting apparatus extracts each term from a target document set, and calculates an appearance frequency of each term and a document frequency that each term appears. The topic extracting apparatus acquires a document set of appearance documents with respect to each extracted term, calculates a topic degree, extracts each term whose topic degree is not lower than a predetermined value as a topic word, and calculates freshness of the extracted topic word based on an appearance date and time. The topic extracting apparatus presents the extracted topic words in order of the freshness and also presents the number of appearance documents of each presented topic word per unit span.
申请公布号 US9449051(B2) 申请公布日期 2016.09.20
申请号 US201314023108 申请日期 2013.09.10
申请人 KABUSHIKI KAISHA TOSHIBA;TOSHIBA SOLUTIONS CORPORATION 发明人 Iwasaki Hideki;Goto Kazuyuki;Matsumoto Shigeru;Miyabe Yasunari;Kobayashi Mikito
分类号 G06F17/30;G06F7/00;G06F17/27 主分类号 G06F17/30
代理机构 Oblon, McClelland, Maier & Neustadt, L.L.P. 代理人 Oblon, McClelland, Maier & Neustadt, L.L.P.
主权项 1. A topic extraction apparatus comprising: a document storing device which stores a target document set comprising documents each having text information and date and time information; a span designating device which accepts designation of a target span which is a target of topic extraction; a topic extracting device which extracts a topic word which is a term representing a topic in the designated target span from the target document set stored in the document storing device, and calculates freshness as a scale representing topicality of each topic word; and a topic presenting device which presents the topic words extracted by the topic extracting device in order of the freshness, and also presents the number of documents in which each of presented topic word appears per unit span, wherein the topic extracting device comprises: a term extracting device which extracts each term from the target document set stored in the document storing device, and calculates each of an appearance frequency of each term and a document frequency indicative of the number of documents in which each term appears; and a topic word extracting device which acquires a document set of appearance documents in which each term appears during the target span with respect to each term extracted by the term extracting device, calculates a topic degree which is a scale representing topic word identity based on a value representing significance of the appearance frequency of each appearance document and a weighted value of each term based on the appearance frequency of the term and the document frequency, extracts each term whose topic degree is not lower than a predetermined value as a topic word, and calculates freshness of the extracted topic word based on an appearance date and time during the target span.
地址 Minato-ku JP