发明名称 Method and system for extracting shadow entities from emails
摘要 One embodiment provides a system for extracting shadow entities from emails. During operation, the system receives a number of document corpora. The system then calculates word-collocation statistics associated with different n-gram sizes for the document corpora. Next, the system receives an email and identifies shadow entities in the email based on the calculated word-collocation statistics for the document corpora.
申请公布号 US8983826(B2) 申请公布日期 2015.03.17
申请号 US201113173698 申请日期 2011.06.30
申请人 Palo Alto Research Center Incorporated 发明人 Brdiczka Oliver;Hizalev Petro
分类号 G06F17/27 主分类号 G06F17/27
代理机构 Park, Vaughan, Fleming & Dowler LLP 代理人 Yao Shun;Park, Vaughan, Fleming & Dowler LLP
主权项 1. A computer-executable method for extracting shadow entities from emails, the method comprising: loading, by a computing device, a shadow-entity extraction application from storage into memory; executing, by the computing device, the shadow-entity extraction application to perform: receiving a number of document corpora; calculating word-collocation statistics associated with different n-gram sizes for the document corpora; receiving an email; identifying noun phrases in the email; filtering the noun phrases in the email based on the calculated word-collocation statistics of different n-gram sizes for the document corpora, wherein filtering the noun phrases comprises determining a filtering threshold of mutual dependency (MD), point-wise mutual information (PMI), or log-frequency biased mutual dependency (LFMD) value based on an average MD, PMI, or LFMD value of the noun phrases in the email against the MD, PMI, or LFMD value for the document corpora; and identifying shadow entities in the email, based on the calculated word-collocation statistics for the document corpora and the filtered noun phrases in the email.
地址 Palo Alto CA US