摘要 |
The process for generating descriptors for text classification proposes the breakdown of complex word forms by matching with all the word forms occurring within a training text. No basis in morphological or linguistic knowledge is required for the preferably cyclically continued breakdown, nor for the accompanying drawing up of stop word prefix and suffix lists. Simple morphological knowledge is provided by the specification of minimum requirements for the form of descriptors and text sections. The process can adapted to new applications very flexibly and easily. The process is, moreover, very fault-tolerant and hence especially suitable for the classification of digitised texts obtained by character recognition processes from written texts or by means of speech recognition processes from spoken texts.
|