发明名称 Supporting acquisition of information
摘要 An apparatus supports acquisition of information from a document including a plurality of words. An acquisition hardware unit acquires first information that shows a degree to which the document belongs to each of a plurality of clusters based on a concept included in the document. Second information shows a degree to which a single word among the plurality of words appears in each of the plurality of clusters based on a concept of the single word. A generation hardware unit, based on the first and second information, generates third information that shows a degree of overlap between the concept included in the document and the concept of the single word. A determination hardware unit determines whether or not the third information shows a degree of overlap that is lower than a predetermined criterion, and an output hardware unit outputs a result of this determination.
申请公布号 US9626433(B2) 申请公布日期 2017.04.18
申请号 US201414246404 申请日期 2014.04.07
申请人 International Business Machines Corporation 发明人 Inagaki Takeshi
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Law Office of Jim Boice 代理人 Law Office of Jim Boice
主权项 1. An apparatus for supporting acquisition of information from a document including a plurality of words, the apparatus comprising: an acquisition hardware unit for acquiring first information that shows a degree to which the document belongs to each of a plurality of clusters based on a concept included in the document, and second information that shows a degree to which a single word among the plurality of words appears in each of the plurality of clusters based on a concept of the single word; a generation hardware unit for, based on the first information and the second information, generating third information that shows a degree of overlap between the concept included in the document and the concept of the single word; a determination hardware unit for determining whether or not the third information shows a degree of overlap that is lower than a predetermined criterion; an output hardware unit for, in response to determining that the third information shows the degree of overlap is lower than the predetermined criterion, outputting information indicating that the single word is a unique word in the document, wherein the unique word is a word that is found only in the document that belongs to one or more of the plurality of clusters, and wherein the document is a single document; at least one processor for retrieving the document that contains the unique word; and an extraction hardware unit for extracting words from each document of the plurality of documents, wherein, with respect to the plurality of documents, by performing clustering using words extracted from each document, the acquisition hardware unit acquires a first probability that is a probability that the document belongs to the plurality of clusters, respectively, as the first information, and acquires a second probability that is a probability that the single word belongs to the plurality of clusters, respectively, as the second information; wherein by calculating a sum total of products of the first probability with respect to a single cluster among the plurality of clusters and the second probability with respect to the single cluster for all clusters of the plurality of clusters, the generation hardware unit generates the sum total as the third information.
地址 Armonk NY US