发明名称 Category-sensitive ranking for text
摘要 Provided are methods, systems and apparatus which include computer program products, for generating topic models for text summarization In one aspect, a method includes receiving a first document of text that is associated with one or more category labels and that includes one or more sequences of one or more words, determining a category label that represents a first category associated with the first document, sampling the one or more sequences to determine a topic and a co-occurrence relationship between the topic and the category label, where a topic represents a subdivision within a category, sampling the one or more sequences to determine a co-occurrence relationship between a sequence in the first document and the topic, and generating a category-topic model that represents the co-occurrence relationships.
申请公布号 US9092422(B2) 申请公布日期 2015.07.28
申请号 US200913520012 申请日期 2009.12.30
申请人 Google Inc. 发明人 Wang Yi;Tao Bo;Liu Zhiyuan
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A method comprising: receiving a plurality of documents of text, wherein each document is associated with one or more category labels and includes one or more sequences of one or more words; determining a plurality of topics from the plurality of documents, wherein each topic represents a subdivision of a respective category label; performing a plurality of sampling iterations to generate a category-topic model that represents co-occurrence relationships between sequences and topics and co-occurrence relationships between topics and categories, wherein performing each of the plurality of sampling iterations comprises, for each sequence in each of the plurality of documents: sampling a category label for the sequence from the category labels associated with the document that includes the sequence;sampling a topic for the sequence; andupdating current values of representations of the co-occurrence relationships based on the category label and the topic sampled for the sequence.
地址 Mountain View CA US