发明名称 Method and apparatus for automatic language determination of Asian language documents
摘要 An automatic language determining apparatus automatically determines the particular Asian language of the text image of a document when the gross script-type is known to be, or is determined to be, an Asian script-type. A connected component generating means generates connected components from the pixels comprising the text image. A character cell generating means generates a character cell surrounding at least one connected component. An optical density determining means determines the optical density, in absolute numbers or percentage of pixels, of the pixels within each character cell. A script feature determining means first generates a histogram, then converts, by linear discriminate analysis, the histogram to a point in a new coordinate space. A language determining means compares the determined point of the text portion in the new coordinate space to predetermined regimes in the new coordinate space corresponding to at least one Asian language to determine the particular Asian language of the text image.
申请公布号 US5425110(A) 申请公布日期 1995.06.13
申请号 US19930047673 申请日期 1993.04.19
申请人 XEROX CORPORATION;FUJI XEROX CORPORATION 发明人 SPITZ, A. LAWRENCE
分类号 G06K9/20;G06F17/27;G06K9/62;G06K9/68;(IPC1-7):G06K9/46 主分类号 G06K9/20
代理机构 代理人
主权项
地址