发明名称 System and method for identifying compounds through iterative analysis
摘要 A system and method for identifying compounds through iterative analysis of measure of association is disclosed. A limit on a number of tokens per compound is specified. Compounds within a text corpus are iteratively evaluated. A number of occurrences of one or more n-grams within the text corpus is determined. Each n-gram includes up to a maximum number of tokens, which are each provided in a vocabulary for the text corpus. At least one n-gram including a number of tokens equal to the limit based on the number of occurrences is identified. A measure of association between the tokens in the identified n-gram is determined. Each identified n-gram with a sufficient measure of association is added to the vocabulary as a compound token and the limit is adjusted.
申请公布号 US7555428(B1) 申请公布日期 2009.06.30
申请号 US20030647203 申请日期 2003.08.21
申请人 GOOGLE INC. 发明人 FRANZ ALEXANDER;MILCH BRIAN
分类号 G06F17/21;G06F17/27;G06F17/28 主分类号 G06F17/21
代理机构 代理人
主权项
地址