发明名称 Method of identifying semantic units in an electronic document
摘要 A method of identifying semantic units in an electronic document includes the steps of: providing an electronic document being described in a page description language, the document having at least one page having a plurality of text fragments, each text fragment including a plurality of glyphs that have not been identified as semantic units, the document further including geometric information and page description language parameters; determining strips of at least one glyph by comparing the geometric position of subsequent glyphs; determining zones of at least one strip wherein a zone is defined by the combined area of strips, the geometrical areas of which overlap with each other; determining a boundary between two semantic units in a zone based on the geometric properties of the glyphs; sorting the identified semantic units in the zone in a sorted list; and, combining subsequent semantic units in the sorted list according to geometric considerations.
申请公布号 US2007002054(A1) 申请公布日期 2007.01.04
申请号 US20060405782 申请日期 2006.04.18
申请人 BRONSTEIN SERGE 发明人 BRONSTEIN SERGE
分类号 G06T11/00 主分类号 G06T11/00
代理机构 代理人
主权项
地址