摘要 |
<p>A system (10) and method (100) for efficiently processing messages (70) stored in multiple message stores (41) is described. Metadata (35) identifying a range of topically identical messages (47) extracted from a plurality of message stores (41) storing a multiplicity of messages (70) to be processed is iteratively copied. The metadata (35) for the extracted range of topically identical messages (47) is categorized. Those messages (70) containing substantially duplicative content within the extracted range are identified as duplicate messages (47). Those non-duplicate messages (44) within the extracted range are tallied into an ordering of conversation thread length (46). Those messages (70) whose content is recursively-included content (72, 73) within another of the tallied non-duplicate messages (44) are classified as near-duplicate messages (45). The remaining messages (71) are designated as unique messages (44) containing substantially non-duplicative content (71).</p> |