摘要 |
A method, non-transitory computer readable medium, and apparatus for extracting text from a social media document are disclosed. For example, the method indexes a plurality of social media documents into a plurality of snippets, receives a query including one or more keywords and a purpose, identifies one or more of the plurality of snippets that include the one or more keywords in an index, ranks the one or more of the plurality of snippets in accordance with the purpose and provides the one or more plurality of snippets that are ranked in accordance with the purpose. |
主权项 |
1. A method for extracting text from a social media document, comprising:
indexing, by a processor, a plurality of social media documents into a plurality of snippets; receiving, by the processor, a query including one or more keywords and a purpose; identifying, by the processor, one or more of the plurality of snippets that include the one or more keywords in an index; ranking, by the processor, the one or more of the plurality of snippets in accordance with the purpose; providing, by the processor, the one or more plurality of snippets that are ranked in accordance with the purpose; and performing, by the processor, a query enhancement on the query, wherein the query enhancement comprises:
identifying one or more related keywords associated with the one or more keywords in the query, wherein the one or more related keywords comprise any term that has a semantic score above a threshold, wherein the semantic score is calculated according to an equation:SEMANTICSCORE(w,k)=∑∀d∈D∑∀s∈Sdfreq(w,snippet)*weight1(w,snippet)*weight2(w,document) wherein D represents a corpus of documents Sd represents the plurality of snippets for a document d, wherein ∀dεD represents for each document d that is an element of the corpus of documents, ∀sεSd represents for each snippet that is an element of the plurality of snippets for the document d, weight1 is defined according to a first weight function:
weight1(w,snippet)=e−(# of words in snippet−average # of words in all snippets) and weight2 is defined according to a second weight function:
weight2(w,document) =e−(# of sentences in document−average # of sentences in all documents); and
providing the one or more related keywords that are identified to be included in an enhanced query. |