发明名称 Caching of deep structures for efficient parsing
摘要 A parsing method and system. The method includes generating an n-gram model of a domain and computing a tf-idf frequency associated with n-grams of the n-gram model. A list including a frequently occurring group of n-grams based on the tf-idf frequency is generated. The frequently occurring group of n-grams is transmitted to a deep parser component and a deep parse output from the deep parser component is generated. The deep parse output is stored within a cache and a processor verifies if a specified text word sequence of the deep parse output is available in the cache.
申请公布号 US9275064(B2) 申请公布日期 2016.03.01
申请号 US201514674000 申请日期 2015.03.31
申请人 International Business Machines Corporation 发明人 Boudreau Michael;Moore Brad;Mousaad Ahmed;Trim Craig M.
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 Schmeiser, Olsen & Watts 代理人 Schmeiser, Olsen & Watts ;Pivnichny John
主权项 1. A method comprising: computing, by a computer processor of a computing system, a term frequency-inverse document frequency (tf-idf) associated with n-grams of an n-gram model of a domain; determining, by said computer processor based on said tf-idf, a frequently occurring group of n-grams of said n-grams; generating, by said computer processor executing a deep parser component of said computing system with respect to said frequently occurring group of n-grams, a deep parse output comprising results of said executing said deep parser component with respect to said frequently occurring group of n-grams; storing, by said computer processor in a database cache, said deep parse output; indexing, by said computer processor executing said frequently occurring group of n-grams in said database cache, said deep parse output; and verifying, by said computer processor, if a pre-computed specified text word sequence of said deep parse output is available in said database cache, wherein said verifying comprises: retrieving from said deep parse output, a plurality of tokens of said deep parser output, wherein said plurality of tokens are associated with a portion of said pre-computed specified text word sequence, wherein said plurality of tokens comprise suffixes associated with structures of said deep parser output, and wherein said plurality of tokens comprise a version token; anddetermining based on said plurality of tokens, variations associated with said pre-computed specified text word sequence.
地址 Armonk NY US