发明名称 EFFICIENT LEXICAL TRENDING TOPIC DETECTION OVER STREAMS OF DATA USING A MODIFIED SEQUITUR ALGORITHM
摘要 Embodiments are directed towards a Modified Sequitur algorithm (MSA) using pipelining and indexed arrays to identify trending topics within a plurality of documents having user generated content (UGC). The documents are parallelized and distributed across a plurality of network devices, which place at least some of the received documents into a buffer for which the MSA may then be applied to the documents within the buffer to identify n-grams or phrases within the documents' contents. The identified phrases are further analyzed to remove extraneous co-occurrences of phrases, and/or words based on a part of speech analysis. A weighting of the remaining phrases is used to identify trending topic phrases. Links to content in the plurality of UGC documents that is associated with the trending topic phrases may then be displayed to a client device.
申请公布号 US2011282874(A1) 申请公布日期 2011.11.17
申请号 US20100780850 申请日期 2010.05.14
申请人 YAHOO! INC. 发明人 XU ZHICHEN;FU YUN;SAMPLE NEAL
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址