摘要 |
This invention is for a new probabilistic model for calculating term co-occurrence scores in natural language processing. There is a co-occurrence score operation unit 20 that is configured to analyse one or more text documents. It gives a score to pairs of terms that occur in the document that indicates the extent of the association between them. The unit comprises a document set input unit 21, a document set processing unit 22 and a co-occurrence set calculation unit 23. There are also associated tables, such as a document table 30, a paragraph table 31, a sentence table 32, a term table 33, a sentence pair count table 34, a term count table 35, a term pair count table 36, a term probability table 37, a term pair probability table 38 and a co-occurrence score table 39. |