主权项 |
1. A method of determining main text in a mark-up document, comprising:
removing, by a system having a processor, first predetermined mark-up tags from the mark-up document, and replacing second predetermined mark-up tags in the mark-up document with separation elements, wherein the removing and the replacing cause the mark-up document to contain text paragraphs and the separation elements without the first and second predetermined mark-up tags; determining, by the system, a length of each of the text paragraphs in the mark-up document; and determining, by the system, one or more main paragraphs of the mark-up document based upon the lengths of the text paragraphs in the mark-up document. |