发明名称 Apparatus and method for text extraction
摘要 A method of determining main text in a mark-up document is provided, which comprises determining a length of each paragraph in the mark-up document; and determining one or more main paragraphs of the mark-up document based upon the length of the paragraphs in the mark-up document.
申请公布号 US8924846(B2) 申请公布日期 2014.12.30
申请号 US200913258464 申请日期 2009.07.03
申请人 Hewlett-Packard Development Company, L.P. 发明人 Zhou Bao-Yao;Xiong Yuhong;Liu Wei
分类号 G06F17/22 主分类号 G06F17/22
代理机构 代理人
主权项 1. A method of determining main text in a mark-up document, comprising: removing, by a system having a processor, first predetermined mark-up tags from the mark-up document, and replacing second predetermined mark-up tags in the mark-up document with separation elements, wherein the removing and the replacing cause the mark-up document to contain text paragraphs and the separation elements without the first and second predetermined mark-up tags; determining, by the system, a length of each of the text paragraphs in the mark-up document; and determining, by the system, one or more main paragraphs of the mark-up document based upon the lengths of the text paragraphs in the mark-up document.
地址 Houston TX US
您可能感兴趣的专利