发明名称 |
Category-sensitive ranking for text |
摘要 |
Provided are methods, systems and apparatus which include computer program products, for generating topic models for text summarization In one aspect, a method includes receiving a first document of text that is associated with one or more category labels and that includes one or more sequences of one or more words, determining a category label that represents a first category associated with the first document, sampling the one or more sequences to determine a topic and a co-occurrence relationship between the topic and the category label, where a topic represents a subdivision within a category, sampling the one or more sequences to determine a co-occurrence relationship between a sequence in the first document and the topic, and generating a category-topic model that represents the co-occurrence relationships. |
申请公布号 |
US9092422(B2) |
申请公布日期 |
2015.07.28 |
申请号 |
US200913520012 |
申请日期 |
2009.12.30 |
申请人 |
Google Inc. |
发明人 |
Wang Yi;Tao Bo;Liu Zhiyuan |
分类号 |
G06F17/30;G06F17/27 |
主分类号 |
G06F17/30 |
代理机构 |
Fish & Richardson P.C. |
代理人 |
Fish & Richardson P.C. |
主权项 |
1. A method comprising:
receiving a plurality of documents of text, wherein each document is associated with one or more category labels and includes one or more sequences of one or more words; determining a plurality of topics from the plurality of documents, wherein each topic represents a subdivision of a respective category label; performing a plurality of sampling iterations to generate a category-topic model that represents co-occurrence relationships between sequences and topics and co-occurrence relationships between topics and categories, wherein performing each of the plurality of sampling iterations comprises, for each sequence in each of the plurality of documents:
sampling a category label for the sequence from the category labels associated with the document that includes the sequence;sampling a topic for the sequence; andupdating current values of representations of the co-occurrence relationships based on the category label and the topic sampled for the sequence. |
地址 |
Mountain View CA US |