摘要 |
A document classification method and system based on document structure and style. The classification method and system categorizes document alphabetical words into complex and non-complex words, categorizes document linguistic sentences into subjective and non-subjective sentences and categorizes document images into descriptive and non-descriptive. The categorization is further used to calculate a complexity, subjectivity and descriptive-images classification of a document. This classification system can be used by a web search engine to filter, sort or tag a set of document references based on user selection.
|