发明名称 System and method for probabilistic relational clustering
摘要 Relational clustering has attracted more and more attention due to its phenomenal impact in various important applications which involve multi-type interrelated data objects, such as Web mining, search marketing, bioinformatics, citation analysis, and epidemiology. A probabilistic model is presented for relational clustering, which also provides a principal framework to unify various important clustering tasks including traditional attributes-based clustering, semi-supervised clustering, co-clustering and graph clustering. The model seeks to identify cluster structures for each type of data objects and interaction patterns between different types of objects. Under this model, parametric hard and soft relational clustering algorithms are provided under a large number of exponential family distributions. The algorithms are applicable to relational data of various structures and at the same time unify a number of state-of-the-art clustering algorithms: co-clustering algorithms, the k-partite graph clustering, and semi-supervised clustering based on hidden Markov random fields.
申请公布号 US9372915(B2) 申请公布日期 2016.06.21
申请号 US201514672430 申请日期 2015.03.30
申请人 The Research Foundation for The State University of New York 发明人 Long Bo;Zhang Zhongfei Mark
分类号 G06F7/00;G06F17/30;G06N7/00 主分类号 G06F7/00
代理机构 Ostrolenk Faber LLP 代理人 Hoffberg Steve M.;Ostrolenk Faber LLP
主权项 1. A method of detection of a community in a network, comprising: automatically optimizing an unsupervised mixed membership relational clustering model based on at least respective relationships between a plurality of interrelated data objects, dependent on different latent classes having respective latent class membership parameters, by maximizing a likelihood function to estimate unknown parameters of a joint probability distribution over latent indicators of the plurality of interrelated data objects having at least one type of data associated with different latent classes, having at least one of respective data object attributes, homogeneous relations between the respective data object and data objects having the same type, and heterogeneous relations between the respective data object and data objects having different types, and observations of the plurality of data object attributes; clustering the interrelated plurality of data objects according to the optimized unsupervised mixed membership relational clustering model; wherein the plurality of interrelated data objects comprise a set of web documents, wherein the respective data object attributes comprise a web document text and the relations between respective data objects comprise link information; and responding to a web search query based on the clustering.
地址 Binghamton NY US