发明名称 Computer-Implemented System And Method For Identifying Duplicate And Near Duplicate Messages
摘要 A computer-implemented system and method for identifying duplicate and near duplicate messages is provided. A set of messages is obtained. A body of one such message is compared with the body of each other message. Those messages having matching bodies are identified as exact duplicates. The exact duplicates are removed from the set. The remaining messages are sorted in order of message length and a shorter message is compared with a longer message. A determination is made that the body of the shorter message is included in the body of the longer message and the shorter message is marked as a near duplicate of the longer message.
申请公布号 US2014122450(A1) 申请公布日期 2014.05.01
申请号 US201414148713 申请日期 2014.01.06
申请人 FTI TECHNOLOGY LLC 发明人 KAWAI KENJI;MCDONALD DAVID T.
分类号 G06F17/30;G06Q10/00;H04L12/58 主分类号 G06F17/30
代理机构 代理人
主权项
地址