发明名称 Computer-Implemented System And Method For Identifying Related Messages
摘要 A system and method for identifying related messages are provided. A set of messages, each having a header, sender and transmission time, is obtained. A message is selected from the set. A body of the selected message is compared to a body of a further message in the set. The further message is labeled as a duplicate of the selected message when the bodies match. The duplicate labeling of the further message is verified when the header, sender, and transmission time of the further message matches the header, sender, and transmission time of the selected message. The duplicate messages are removed from the set. The remaining messages are sorted in order of message length. A shorter message is compared with a longer message and is marked as a near duplicate of the longer message when the body of the shorter message is included in the body of the longer message.
申请公布号 US2015100595(A1) 申请公布日期 2015.04.09
申请号 US201414571282 申请日期 2014.12.15
申请人 FTI Technology LLC 发明人 Kawai Kenji;McDonald David T.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented system for identifying related messages, comprising: a set of messages, each comprising a header, sender, and transmission time; a selection module to select one of the messages from the set; a content comparison module to compare a body of the selected message to a body of a further message in the set; a labeling module to label the further message as a duplicate of the selected message when the bodies match; a verification module to verify the duplicate labeling of the further message when the header, sender, and transmission time of the further message matches the header, sender, and transmission time of the selected message; a removal module to remove the messages with verified duplicate labels; a sorting module to sort the remaining messages of the set in order of message length; a length comparison module to compare a shorter message comprising a short text body with a message comprising a longer text body and to determine that the body of the shorter message is included in the body of the longer message; and a marker module to mark the shorter message as a near duplicate of the longer message.
地址 Annapolis MD US