发明名称 BAG-OF-REPEATS REPRESENTATION OF DOCUMENTS
摘要 A system and method for representing a textual document based on the occurrence of repeats are disclosed. The system includes a sequence generator which defines a sequence representing words forming a collection of documents. A repeat calculator identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once. A representation generator generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats.
申请公布号 US2014229160(A1) 申请公布日期 2014.08.14
申请号 US201313765066 申请日期 2013.02.12
申请人 XEROX CORPORATION 发明人 Galle Matthias
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项 1. A system for representing a textual document based on the occurrence of repeats, comprising: a sequence generator which defines a sequence representing words forming a collection of documents; a repeat calculator which identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once; a representation generator which generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats; and a processor which implements the sequence generator, repeat calculator, and representation generator.
地址 Norwalk CT US