发明名称 System and methods for arabic text recognition based on effective arabic text feature extraction
摘要 A method for automatically recognizing Arabic text includes building an Arabic corpus comprising Arabic text files written in different writing styles and ground truths corresponding to each of the Arabic text files, storing writing-style indices in association with the Arabic text files, digitizing an Arabic word to form an array of pixels, dividing the Arabic word into line images, forming a text feature vector from the line images, training a Hidden Markov Model using the Arabic text files and ground truths in the Arabic corpus in accordance with the writing-style indices, and feeding the text feature vector into a Hidden Markov Model to recognize the Arabic words.
申请公布号 US8908961(B2) 申请公布日期 2014.12.09
申请号 US201414259309 申请日期 2014.04.23
申请人 King Abdulaziz City for Science & Technology 发明人 Khorsheed Mohammad S.;Al-Omari Hussein K.
分类号 G06K9/62;G06K9/34;G06K9/46;G06K9/80;G06K9/18;G06K9/00 主分类号 G06K9/62
代理机构 SV Patent Service 代理人 SV Patent Service
主权项 1. A method for automatically recognizing Arabic text, comprising: acquiring a text image comprising one or more Arabic words each including one or more Arabic characters; identify a plurality of lines of Arabic text in the text image; segmenting one of the plurality of lines of Arabic text into Arabic words; digitizing at least one of the Arabic words to form a two-dimensional array of pixels each associated with a pixel value, wherein the pixel value is expressed in a binary number; dividing the one of the Arabic words into a plurality of line images; defining a plurality of cells in one of the plurality of line images, wherein each of the plurality of cells comprises a group of adjacent pixels; serializing pixel values of pixels in each of the plurality of cells in one of the plurality of line images to form a binary cell number; forming a text feature vector according to binary cell numbers obtained from the plurality of cells in one of the plurality of line images; and feeding the text feature vector into a Hidden Markov Model to recognize the one or more Arabic words including the Arabic characters.
地址 Riyadh SA