发明名称 Method of identifying redundant text in an electronic document
摘要 A method of identifying redundant text fragments, which create artificial artifacts only, in an electronic page description language document includes a) providing a page having a plurality of text fragments, each text fragment comprising at least one glyph, the document including Unicode values for all glyphs and geometric information of all text fragments on the page and page description language parameters of all glyphs, b) identifying two text fragments as redundant candidates, if the Unicode sequence of the text fragments have identical corresponding Unicode sequences, c) defining a bounding box of quadrangular shape for each of the two redundant candidates according to their font characteristics, d) calculating the overlapping area of the two bounding boxes, and e) determining whether the two candidates form redundant text fragments by comparing the ratio of the overlapping area to the area of the smaller bounding box of both text fragments with a predetermined threshold.
申请公布号 US2006282769(A1) 申请公布日期 2006.12.14
申请号 US20060405771 申请日期 2006.04.18
申请人 BRONSTEIN SERGE 发明人 BRONSTEIN SERGE
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址