发明名称 Systems and methods for generating summaries of documents
摘要 Systems and methods for summarizing online articles for consumption on a user device are disclosed herein. The system extracts the main body of an article's text from the HTML code of an online article. The system may then classify the extracted article into one of several different categories and removes duplicate articles. The system breaks down the article into its component sentences, and each sentence is classified into one of three categories: (1) potential candidate sentences that may be included in the generated summary; (2) weakly rejected sentences that will not be included in the summary but may be used to generate the summary; and (3) strongly rejected sentences that are not included in the summary. Finally, the system applies a document summarizer to generate quickly readable article summaries, for viewing on the user device, using relevant sentences from the article while maintaining the coherence of the article.
申请公布号 US9317498(B2) 申请公布日期 2016.04.19
申请号 US201514681612 申请日期 2015.04.08
申请人 CODEQ LLC 发明人 Baker Douglas Dane;Fernández Paulo Malvar;Fernandes Brian;Martinez Rodrigo Alarcón
分类号 G06F17/27;G06F17/20;G06F17/21 主分类号 G06F17/27
代理机构 Hunton & Williams LLP 代理人 Hunton & Williams LLP
主权项 1. A system, comprising: an interface processor that provides an interactive graphical user interface to a user device over a network; a parser processor that retrieves an RSS feed over the network, and generate an initial set of articles based on the RSS feed; a categorization processor that categorizes each article in the initial set of articles into one of a plurality of subject matter categories; a deduplication processor that generates a final set of articles by removing duplicate articles from the initial set of articles, wherein each article in the final set of articles comprises a plurality of sentences and a title; and a summarization processor that, for each article in the final set of articles: generates a preliminary score for each sentence in the plurality of sentences,assigns each sentence in the plurality of sentences to one of three categories,generates an article summary, wherein the article summary comprises one or more sentences from one of the three categories, wherein the article summary is based at least in part on the preliminary score for each sentence, andprovides the article summary to the user device over the network via the interface processor;wherein the three categories consist of strongly rejected sentences, weakly rejected sentences, and potential candidate sentences.
地址 Apex NC US