发明名称 METHOD AND APPARATUS FOR IDENTIFYING WORDS DESCRIBED IN A PORTABLE ELECTRONIC DOCUMENT
摘要 A method and apparatus for identifying words stored in a portable electronic document A digital computation apparatus stores a page of a document including characters in text segments that have not been identified as words. A word identifying mechanism analyzes the text segments of the page and stores the text segments as text objects in a linked list. The word identifying mechanism identifies words from the text objects in the linked list by analyzing the text objects for word breaks and by analyzing gaps between text objects using position data associated with the text segments. The identified words are stored in a word list and are sorted if necessary. A method of the present invention receives a text segment from a page of a document having multiple text segments and associated position data, including x and y coordinates for each text segment. A text object is created for each text segment, and the text objects are entered into a linked list. Words are then identified from the linked list by analyzing the text objects for word breaks and by analyzing gaps between text objects using the associated position data. Words that are identified in the text objects are added to a word list. The above steps are repeated until the end of the page is reached. The method and apparatus can be used for searching for words in a portable electronic document.
申请公布号 CA2153377(A1) 申请公布日期 1996.03.13
申请号 CA19952153377 申请日期 1995.07.06
申请人 ADOBE SYSTEMS, INC. 发明人 PAKNAD, MOHAMMAD DARYOUSH;AYERS, ROBERT M.
分类号 G06F17/21;G06F17/24;G06K9/20;(IPC1-7):G06K9/00 主分类号 G06F17/21
代理机构 代理人
主权项
地址