发明名称 Methods and apparatus related to sentence compression
摘要 Methods and apparatus related to sentence compression. Some implementations are generally directed toward generating a corpus of extractive compressions and associated sentences based on a set of headline, sentence pairs from documents. Some implementations are generally directed toward utilizing a corpus of sentences and associated sentence compressions in training a supervised compression system. Some implementations are generally directed toward determining a compression of a sentence based on edge weights for edges of the sentence that are determined based on weights of features associated with the edges.
申请公布号 US9336186(B1) 申请公布日期 2016.05.10
申请号 US201314050863 申请日期 2013.10.10
申请人 Google Inc. 发明人 Filippova Ekaterina;Altun Yasemin
分类号 G06F17/28;G06F17/21 主分类号 G06F17/28
代理机构 Middleton Reutlinger 代理人 Middleton Reutlinger
主权项 1. A computer implemented method, comprising: determining, by one or more computing systems, a set of headline, sentence pairs from documents; each of the headline, sentence pairs having: a headline from a respective document of the documents, the headline having a plurality of headline terms, anda sentence from the respective document, the sentence having a plurality of sentence terms; generating, by one or more of the computing systems for each of the headline, sentence pairs of the set, an extractive compression of the sentence, wherein generating the extractive compression for a given pair of the headline, sentence pairs includes: matching, by one or more of the computing systems, headline open-class terms of the headline terms with sentence open-class terms of the sentence terms in one or more nodes of the sentence,determining, by one or more of the computing systems, a minimum subtree of the sentence that includes the nodes of the sentence having the sentence open-class terms matching the headline open-class terms, anddetermining, by one or more of the computing systems, the extractive compression of the sentence based on the minimum subtree; storing, by one or more of the computing systems in one or more databases: a plurality of the generated extractive compressions, andfor each stored of the generated extractive compressions, an association to a respective said sentence; and training a supervised compression computing system utilizing the stored of the generated extractive compressions and the associated sentences, wherein training the supervised compression system comprises: iteratively learning, by the supervised compression computing system, weights for each of a plurality of coordinates of a feature vector, the coordinates representing features for edges between sentence nodes, and the iteratively learning comprising updating the weights of the feature vector by the supervised compression computing system during a plurality of iterations of the learning based on the generated extractive compressions and the associated sentences.
地址 Mountain View CA US