发明名称 Method of identifying words in an electronic document
摘要 <p>The method of identifying semantic units in an electronic document comprises the steps of: providing (10) an electronic document being described in a page description language, the document comprising at least one page having a plurality of text fragments, each text fragment including a plurality of glyphs that have not been identified as semantic units, the document further comprising geometric information and page description language parameters; determining (14) strips of at least one glyph by comparing (48) the geometric position of subsequent glyphs, determining (16) zones of at least one strip wherein a zone is defined by the combined area of strips, the geometrical areas of which overlap with each other; determining (102) a boundary between two semantic units in a zone based on the geometric properties of the glyphs, sorting (104) the identified semantic units in the zone in a sorted list; and combining (108) subsequent semantic units in the sorted list according to geometric considerations.</p>
申请公布号 EP1739574(A1) 申请公布日期 2007.01.03
申请号 EP20050014369 申请日期 2005.07.01
申请人 PDFLIB GMBH 发明人 BRONSTEIN, SERGE
分类号 G06F17/21;G06K9/20 主分类号 G06F17/21
代理机构 代理人
主权项
地址