发明名称 Method and system for data mining of short message streams
摘要 A method and system for summarizing messages from a message stream is disclosed in which association analysis is applied to stream of short data messages comprising words in a spoken language, such as English. Clusters of words are identified that provide a summary of the several conversations (short data messages originating from different human sources) that are imbedded in the message stream. Each word cluster may represent a set of messages that are its instances. The word clusters may collectively constitute a summary of the entire message stream. The word clusters that have been extracted from message stream may also be grouped into topics. Also, an identity of one or more message originators may be listed based on their influence on the messages being analyzed. The short data messages may also be sorted based on a geographical location of one or more originators of messages.
申请公布号 US9558165(B1) 申请公布日期 2017.01.31
申请号 US201213589147 申请日期 2012.08.19
申请人 EMICEN CORP. 发明人 Marsten Roy;Caldwell Russell;Subramanian Radhika
分类号 G06F17/27;G06F17/21;G06F17/24 主分类号 G06F17/27
代理机构 Smith Tempel 代理人 Smith Tempel ;Wigmore Steven P.
主权项 1. A computer-implemented method for summarizing a message stream, method comprising the steps of: defining a communications channel with one or more key words, wherein defining the communications channel comprises specifying one or more key words that are used to extract a message from the message stream, the message stream comprising at least two messages; extracting one or more messages from the message stream based on the defined channel, wherein extracting one or more messages from the message stream based on the defined channel comprises filtering one or more messages from the message stream using the defined channel as a filter for selecting a message to be extracted for additional processing; removing common words from the one or more extracted messages; building a word order graph for the one or more extracted messages, the word order graph tracking sequencing of words found within each extracted message; using an algorithm to find commonly occurring word clusters within each extracted message, wherein the algorithm reviews each extracted message for at least two-word clusters with a predetermined pair-frequency, the pair-frequency comprising a number of times that words appear together in an extracted message; pruning the word clusters to reduce a total number of word clusters; ranking one or more surviving clusters to determine an order of presentation; arranging each word cluster into a natural order based on the word order graph; and displaying the word clusters as a summary of the message stream.
地址 Atlanta GA US