发明名称 Method and apparatus for determining the frequency of phrases in a document without document image decoding
摘要 Methods and apparatus for determining phrase frequency in an undecoded document text image without first converting the document to character codes. The method includes segmenting of the document image into word units without document image decoding, and morphological image processing to determine word unit characteristics for placement into equivalence classes utilizing non-content based information. All of the possible sequences of selected word units in reading order in the document constituting phrases are mapped into a list of corresponding sequences of the associated equivalence class labels for each selected image unit in the phrase, and the corresponding equivalence class sequences are analyzed to determine the frequency of the phrases.
申请公布号 US5369714(A) 申请公布日期 1994.11.29
申请号 US19910794555 申请日期 1991.11.19
申请人 XEROX CORPORATION 发明人 WITHGOTT, M. MARGARET;RAO, RAMANA R.
分类号 G06F17/21;G06K9/72;G06T11/60;(IPC1-7):G06K9/36 主分类号 G06F17/21
代理机构 代理人
主权项
地址