主权项 |
1. A computer implemented method, comprising:
determining, by one or more computing systems, a set of headline, sentence pairs from documents; each of the headline, sentence pairs having:
a headline from a respective document of the documents, the headline having a plurality of headline terms, anda sentence from the respective document, the sentence having a plurality of sentence terms; generating, by one or more of the computing systems for each of the headline, sentence pairs of the set, an extractive compression of the sentence, wherein generating the extractive compression for a given pair of the headline, sentence pairs includes:
matching, by one or more of the computing systems, headline open-class terms of the headline terms with sentence open-class terms of the sentence terms in one or more nodes of the sentence,determining, by one or more of the computing systems, a minimum subtree of the sentence that includes the nodes of the sentence having the sentence open-class terms matching the headline open-class terms, anddetermining, by one or more of the computing systems, the extractive compression of the sentence based on the minimum subtree; storing, by one or more of the computing systems in one or more databases:
a plurality of the generated extractive compressions, andfor each stored of the generated extractive compressions, an association to a respective said sentence; and training a supervised compression computing system utilizing the stored of the generated extractive compressions and the associated sentences, wherein training the supervised compression system comprises:
iteratively learning, by the supervised compression computing system, weights for each of a plurality of coordinates of a feature vector, the coordinates representing features for edges between sentence nodes, and the iteratively learning comprising updating the weights of the feature vector by the supervised compression computing system during a plurality of iterations of the learning based on the generated extractive compressions and the associated sentences. |