主权项 |
1. A system for representing a textual document based on the occurrence of repeats, comprising:
a sequence generator which defines a sequence representing words forming a collection of documents; a repeat calculator which identifies a set of repeats within the sequence, the set of repeats comprising subsequences of the sequence which each occur more than once; a representation generator which generates a representation for at least one document in the collection of documents based on occurrence, in the document, of repeats from the set of repeats; and a processor which implements the sequence generator, repeat calculator, and representation generator. |