Systems and methods for classification of documents based on topic to which the documents pertain are described herein. In one implementation, the method comprises computing a probability of a document being topical based on a number of constituent elements that are topical and a total number of constituent elements and computing a probability of the document being anti-topical based on a number of constituent elements that are anti-topical and the total number of constituent elements. The method further comprises determining whether the probability of the document being topical is greater than the probability of the document being anti-topical. Thereafter, the method includes classifying the document as topical on determining the probability of the document being topical to be greater than the probability of the document being anti-topical.
申请公布号
WO2014203264(A1)
申请公布日期
2014.12.24
申请号
WO2013IN00390
申请日期
2013.06.21
申请人
HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;ANANTHARANGACHAR, RAGHU;CHOURASIYA, PRADEEP;VISWANATHAN, KAPALEESWARAN;DIXIT, SUDHIR