发明名称 Method and apparatus for extracting portions of text from long social media documents
摘要 A method, non-transitory computer readable medium, and apparatus for extracting text from a social media document are disclosed. For example, the method indexes a plurality of social media documents into a plurality of snippets, receives a query including one or more keywords and a purpose, identifies one or more of the plurality of snippets that include the one or more keywords in an index, ranks the one or more of the plurality of snippets in accordance with the purpose and provides the one or more plurality of snippets that are ranked in accordance with the purpose.
申请公布号 US9213730(B2) 申请公布日期 2015.12.15
申请号 US201313965924 申请日期 2013.08.13
申请人 Xerox Corporation 发明人 Bhatia Sumit;Kataria Saurabh;Peng Wei;Sun Tong
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for extracting text from a social media document, comprising: indexing, by a processor, a plurality of social media documents into a plurality of snippets; receiving, by the processor, a query including one or more keywords and a purpose; identifying, by the processor, one or more of the plurality of snippets that include the one or more keywords in an index; ranking, by the processor, the one or more of the plurality of snippets in accordance with the purpose; providing, by the processor, the one or more plurality of snippets that are ranked in accordance with the purpose; and performing, by the processor, a query enhancement on the query, wherein the query enhancement comprises: identifying one or more related keywords associated with the one or more keywords in the query, wherein the one or more related keywords comprise any term that has a semantic score above a threshold, wherein the semantic score is calculated according to an equation:SEMANTIC⁢⁢SCORE⁡(w,k)=∑∀d∈D⁢⁢∑∀s∈Sd⁢⁢freq⁡(w,snippet)*weight1⁡(w,snippet)*weight2⁡(w,document) wherein D represents a corpus of documents Sd represents the plurality of snippets for a document d, wherein ∀dεD represents for each document d that is an element of the corpus of documents, ∀sεSd represents for each snippet that is an element of the plurality of snippets for the document d, weight1 is defined according to a first weight function: weight1(w,snippet)=e−(# of words in snippet−average # of words in all snippets) and weight2 is defined according to a second weight function: weight2(w,document) =e−(# of sentences in document−average # of sentences in all documents); and providing the one or more related keywords that are identified to be included in an enhanced query.
地址 Norwalk CT US