发明名称 PROBABILISTIC MODEL FOR TERM CO-OCCURRENCE SCORES
摘要 Apparatus for calculating term co-occurrence scores for use in a natural language processing method, where a term is a word or a group of consecutive words, in which apparatus at least one text document is analysed and pairs of terms, from terms which occur in the document, are ascribed respective co-occurrence scores to indicate an extent of an association between them, comprises sentence sequence processing means (280) and co-occurrence score set calculation means (230), wherein: the sentence sequence processing means (280) are operable to: for each of all possible sequences of sentences in a document, where the minimum number of sentences in a sequence is one and the maximum number of sentences in a sequence has a predetermined value, determine a weighting value w which is a decreasing function of the number of sentences in the sentence sequence; determine a sentence sequence count value, based on the sum of all the determined weighting values; obtain a document term count value, where the document term count value is the sum of sentence sequence term count values determined for all the sentence sequences, each sentence sequence term count value indicating the frequency with which a term occurs in a sentence sequence and being based on the weighting value for the sentence sequence; and for each of all possible different term pairs in all sentence sequences, where a term pair consists of a term in a sentence sequence paired with another term in the sentence sequence, obtain a term pair count value which is the sum of the weighting values for all sentence sequences in which the term pair occurs; and the co-occurrence score set calculation means (230) are operable to obtain a term co-occurrence score for each term pair using the document term count values for the terms in the pair, the term pair count value for the term pair and the sentence sequence count value. Apparatus for processing sentence pairs is also disclosed.
申请公布号 EP3091444(A2) 申请公布日期 2016.11.09
申请号 EP20160157897 申请日期 2016.02.29
申请人 FUJITSU LIMITED 发明人 MITSUISHI, YUTAKA;NOVÁCEK, VIT
分类号 G06F17/27;G06F17/30 主分类号 G06F17/27
代理机构 代理人
主权项
地址