发明名称 Efficient parsing with structured prediction cascades
摘要 A dependency parsing method can include determining an index set of possible head-modifier dependencies for a sentence. The index set can include inner arcs and outer arcs, inners arcs representing possible dependency between words in the sentence separated by a distance less than or equal to a threshold and outer arcs representing possible dependency between words in the sentence separated by a distance greater than the threshold. The index set can be pruned to include: (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs when a likelihood that there exists any possible outer arc that is appropriate is greater than the first threshold. The method can include further pruning the pruned index set based on a second parsing algorithm, and determining a most-likely parse for the sentence from the pruned index set.
申请公布号 US8914279(B1) 申请公布日期 2014.12.16
申请号 US201213624280 申请日期 2012.09.21
申请人 Google Inc. 发明人 Petrov Slav;Rush Alexander
分类号 G06F17/27;G06F17/20;G06F17/21 主分类号 G06F17/27
代理机构 Remarck Law Group PLC 代理人 Remarck Law Group PLC
主权项 1. A computer-implemented method, comprising: receiving, at a computing device having one or more processors, a sentence including one or more words; determining, at the computing device, an index set of possible head-modifier dependencies for the sentence, the index set including inner arcs and outer arcs, the inners arcs representing possible head-modifier dependency between words in the sentence separated by a distance less than or equal to a first distance threshold and outer arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than the first distance threshold; pruning, at the computing device, the outer arcs to exclude arcs representing possible head-modifier dependency between words in the sentence separated by a distance greater than a second distance threshold to obtain a first pruned index set, the second distance threshold being based on a determination of a longest head-modifier dependency distance observed in training data; pruning, at the computing device, the first pruned index set based on an augmented vine parsing algorithm to obtain a second pruned index set, the second pruned index set including: (i) each specific inner arc when a likelihood that the specific inner arc is appropriate is greater than a first threshold, and (ii) the outer arcs in the first pruned index set when a likelihood that there exists a possible outer arc that is appropriate is greater than the first threshold, wherein each specific inner arc corresponds to a specific index and wherein the likelihood that the specific inner arc is appropriate is determined based on a max-marginal value of its corresponding specific index; pruning, at the computing device, the second pruned index set based on a second parsing algorithm to obtain a third pruned index set, the second parsing algorithm being a first-order parsing model; pruning, at the computing device, the third pruned index set based on a third parsing algorithm to obtain a fourth pruned index set, the third parsing algorithm being a second-order parsing model; pruning, at the computing device, the fourth pruned index set based on a fourth parsing algorithm to obtain a fifth pruned index set, the fourth parsing algorithm being a third-order parsing model; determining, at the computing device, a most-likely parse for the sentence from the fifth pruned index set; and outputting, from the computing device, the most-likely parse.
地址 Mountain View CA US