发明名称 Relevant persons identification leveraging both textual data and social context
摘要 A set of documents is annotated by metadata specifying persons associated with documents and their social roles in the documents. The annotated documents define a group of representation modes including at least one content type and at least one social role. An electronic processing device computes a relevance score for a person of interest using a set of queries each having a target social role by performing a sequence of operations that includes the following operations: computing similarities between documents and queries with respect to at least one similarity mode of the group of representation modes; enriching queries or documents to identify and aggregate nearest neighbor documents that are most similar with respect to at least one enrichment mode of the group of representation modes; aggregating over documents; aggregating over queries; and aggregating over at least one of (i) enrichment modes, (ii) similarity modes, and (iii) target social roles.
申请公布号 US8812496(B2) 申请公布日期 2014.08.19
申请号 US201213422189 申请日期 2012.03.16
申请人 Xerox Corporation 发明人 Renders Jean-Michel;Mantrach Amin
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Fay Sharpe LLP 代理人 Fay Sharpe LLP
主权项 1. A system operative on a set of documents annotated by metadata specifying persons associated with documents and their social roles in the documents wherein the annotated documents define a group of representation modes including at least one content type and at least one social role, the system comprising: an electronic processing device configured to perform the operation of: computing a relevance score for a person of interest using a set of queries wherein each query has a target social role by performing one of the relevance scoring sequences Rw(p,Q) listed in the tables:wQEDEm′m″kdSequence 510N/ABBB⊕r′⊕mC(m)(⊕r′dxd, ⊕r′k⊕m″E(m″)(qk)) 610N/ABBA⊕r′⊕m⊕r′dC(m)(xd, ⊕r′k⊕m″E(m″)(qk)) 710N/ABAB⊕r′⊕m⊕r′kC(m)(⊕r′dxd, ⊕m″E(m″)(qk)) 810N/ABAA⊕r′⊕m⊕r′d⊕r′kC(m)(xd, ⊕m″E(m″)(qk)) 910N/AABB⊕r′⊕m⊕m″C(m)(⊕r′dxd, ⊕r′kE(m″)(qk))1010N/AABA⊕r′⊕m⊕m″⊕r′dC(m)(xd, ⊕r′kE(m″)(qk))1110N/AAAB⊕r′⊕m⊕m″⊕r′kC(m)(⊕r′dxd, E(m″)(qk))1210N/AAAA⊕r′⊕m⊕m″⊕r′d⊕r′kC(m)(xd, E(m″)(qk))1301BN/ABB⊕r′⊕mC(m)(⊕r′d⊕m′E(m′)(xd), ⊕r′kqk))1401BN/ABA⊕r′⊕m⊕r′dC(m)(⊕m′E(m′)(xd), ⊕r′kqk))1501BN/AAB⊕r′⊕m⊕r′kC(m)(⊕r′d⊕m′E(m′)(xd), qk))1601BN/AAA⊕r′⊕m⊕r′d⊕r′kC(m)(⊕m′E(m′)(xd), qk))1701BN/ABB⊕r′⊕m⊕m′C(m)(⊕r′dE(m′)(xd), ⊕r′kqk))1801BN/ABA⊕r′⊕m⊕m′⊕r′dC(m)(E(m′)(xd), ⊕r′kqk))1901BN/AAB⊕r′⊕m⊕m′⊕r′kC(m)(⊕r′dE(m′)(xd), qk))2001BN/AAA⊕r′⊕m⊕m′⊕r′d⊕r′kC(m)(E(m′)(xd), qk))wQEDEm′m″kd2111BBBB2211BBBA2311BBAB2411BBAA2511BABB2611BABA2711BAAB2811BAAA2911ABBB3011ABBA3111ABAB3211ABAA3311AABB3411AABA3511AAAB3611AAAAwSequence21⊕r′⊕mC(m)(⊕r′d⊕m′E(m′)(xd), ⊕r′k⊕m″E(m″)(qk))22⊕r′⊕m⊕r′dC(m)(⊕m′E(m′)(xd), ⊕r′k⊕m″E(m″)(qk))23⊕r′⊕m⊕r′kC(m)(⊕r′d⊕m′E(m′)(xd), ⊕m″E(m″)(qk))24⊕r′⊕m⊕r′d⊕r′kC(m)(⊕m′E(m′)(xd), ⊕m″E(m″)(qk)) 25⊕r′⊕m⊕m″⊕r′kC(m)(⊕r′d⊕m′E(m′)(xd), E(m″)(qk)) 26⊕r′⊕m⊕m″⊕r′dC(m)(⊕m′E(m′)(xd), ⊕r′kE(m″)(qk))27⊕r′⊕m⊕m″⊕r′kC(m)(⊕r′d⊕m′E(m′)(xd), E(m″)(qk)) 28⊕r′⊕m⊕m″⊕r′d⊕r′kC(m)(⊕m′E(m′)(xd), E(m″)(qk))29⊕r′⊕m⊕m′C(m)(⊕r′dE(m′)(xd), ⊕r′k⊕m″E(m″)(qk))30⊕r′⊕m⊕m′⊕r′dC(m)(E(m′)(xd), ⊕r′k⊕m″E(m″)(qk)) 31⊕r′⊕m⊕m′⊕r′kC(m)(⊕r′dE(m′)(xd), ⊕m″E(m″)(qk))32⊕r′⊕m⊕m′⊕r′d⊕r′kC(m)(E(m′)(xd), ⊕m″E(m″)(qk))33⊕r′⊕m⊕m′⊕m″C(m)(⊕r′dE(m′)(xd), ⊕r′kE(m″)(qk))34⊕r′⊕m⊕m′⊕m″⊕r′dC(m)(E(m′)(xd), ⊕r′kE(m″)(qk))35⊕r′⊕m⊕m′⊕m″⊕r′kC(m)(⊕r′dE(m′)(xd), E(m″)(qk))36⊕r′⊕m⊕m′⊕m″⊕r′d⊕r′kC(m)(E(m′)(xd), E(m″)(qk))andwSequence 5bis⊕r′⊕mC(m)(⊕r′dxd), ⊕m″E(m″)(⊕r′kqk)) 6bis⊕r′⊕m⊕r′dC(m)(xd, ⊕m″E(m″)(⊕r′kqk)) 9bis⊕r′⊕m⊕m″C(m)(⊕r′dxd, E(m″)(⊕r′kqk))10bis⊕r′⊕m⊕m″⊕r′dC(m)(xd, E(m″)(⊕r′kqk))13bis⊕r′⊕mC(m)(⊕m′E(m′)(⊕r′dxd), ⊕r′kqk)15bis⊕r′⊕m⊕r′kC(m)(⊕m′E(m′)(⊕r′dxd), qk)17bis⊕r′⊕m⊕m′C(m)(E(m′)(⊕r′dxd), ⊕r′kqk)19bis⊕r′⊕m⊕m′⊕r′kC(m)(E(m′)(⊕r′dxd), qk)21bis⊕r′⊕mC(m)(⊕m′E(m′)(⊕r′dxd), ⊕r′k⊕m″E(m″)(qk))21ter⊕r′⊕mC(m)(⊕r′d⊕m′E(m′)(xd), ⊕m″E(m″)(⊕r′kqk))21quarter⊕r′⊕mC(m)(⊕m′E(m′)(⊕r′dxd), ⊕m″E(m″)(⊕r′kqk))22bis⊕r′⊕m⊕r′dC(m)(⊕m′E(m′)(xd), ⊕m″E(m″)(⊕r′kqk))23bis⊕r′⊕m⊕r′kC(m)(⊕m′E(m′)(⊕r′dxd), ⊕m″E(m″)(qk))25bis⊕r′⊕m⊕m″C(m)(⊕m′E(m′)(⊕r′dxd), ⊕r′kE(m″)(qk))25ter⊕r′⊕m⊕m″C(m)(⊕r′d⊕m′E(m′)(xd), E(m″)(⊕r′kqk))25quarter⊕r′⊕m⊕m″C(m)(⊕m′E(m′)(⊕r′dxd), E(m″)(⊕r′kqk))26bis⊕r′⊕m⊕m″⊕r′dC(m)(⊕m′E(m′)(xd), E(m″)(⊕r′kqk))27bis⊕r′⊕m⊕m″⊕r′kC(m)(⊕m′E(m′)(⊕r′dxd), E(m″)(qk))29bis⊕r′⊕m⊕m′C(m)(E(m′)(⊕r′dxd), ⊕r′k⊕m″E(m″)(qk))29ter⊕r′⊕m⊕m′C(m)(⊕r′dE(m′)(xd), ⊕m″E(m″)(⊕r′kqk))29quarter⊕r′⊕m⊕m′C(m)(E(m′)(⊕r′dxd), ⊕m″E(m″)(⊕r′kqk))30bis⊕r′⊕m⊕m′⊕r′dC(m)(E(m′)(xd), ⊕m″E(m″)(⊕r′kqk))31bis⊕r′⊕m⊕m′⊕r′kC(m)(E(m′)(⊕r′dxd), ⊕m″E(m″)(qk))33bis⊕r′⊕m⊕m′⊕m″C(m)(E(m′)(⊕r′dxd), ⊕r′kE(m″)(qk))33ter⊕r′⊕m⊕m′⊕m″C(m)(⊕r′dE(m′)(xd), E(m″)(⊕r′kqk))33quarter⊕r′⊕m⊕m′⊕m″C(m)(E(m′)(⊕r′dxd), E(m″)(⊕r′kqk))34bis⊕r′⊕m⊕m′⊕m″⊕r′dC(m)(E(m′)(xd), E(m″)(⊕r′kqk))35bis⊕r′⊕m⊕m′⊕m″⊕r′kC(m)(E(m′)(⊕r′dxd), E(m″)(qk))wherein p denotes the person of interest, Q denotes the set of queries, k indexes the queries of the set of queries and rk′ denotes the target social role of query qk, xd denotes a document where d indexes the documents, C( . . . ) denotes comparison, ε( . . . ) denotes enrichment, ⊕i denotes aggregation over indices i, m indexes the at least one similarity mode, at least one of m′ and m″ index the at least one enrichment mode where m′ denotes document enrichment mode and m″ denotes query enrichment mode and in the table columns headed m′ and m″ the symbol “B” denotes the enriching is performed before the computing of similarities, the symbol “A” denotes the enriching is performed after the computing of similarities, and the symbol “N/A” denotes the enriching is not performed; repeating the computing for each person of a set of persons of interest to compute a relevance score for each person of the set of persons of interest, and at least one of (i) ranking the persons of the set of persons of interest respective to relevance score and (ii) selecting a sub-set of persons from the set of persons of interest who have the highest relevance scores.
地址 Norwalk CT US