摘要 |
PCT No. PCT/EP96/02620 Sec. 371 Date Mar. 14, 1997 Sec. 102(e) Date Mar. 14, 1997 PCT Filed Jun. 18, 1996 PCT Pub. No. WO97/04406 PCT Pub. Date Feb. 6, 1997The proposed method for generating descriptors for the classification of texts provides a breakdown of more complex word forms by way of matching with the entirety of word forms occurring within a compilation of training texts. No morphological or linguistic knowledge base is necessary for the preferably cyclically continued breakdown, nor for the accompanying drawing up of stop word prefix and suffix lists. Simple morphological knowledge is provided by prescribing minimum requirements with respect to the form of descriptors and text sections. The method is particularly flexible and can be easily adapted to new applications. The method is also very error-tolerant and thus particularly suited for the classification of digitized texts which are produced from written texts by means of character recognition methods or from spoken texts by means of language recognition methods.
|