发明名称 METHOD FOR RETRIEVAL OF ARABIC HISTORICAL MANUSCRIPTS
摘要 The method for retrieval of Arabic historical manuscripts using Latent Semantic Indexing approaches the problem of manuscripts indexing and retrieval by automatic indexing of Arabic historical manuscripts through word spotting, using “Text Image” similarity of keywords. The similarity is computed using Latent Semantic Indexing (LSI). The method involves a manuscript page preprocessing step, a segmentation step, and a feature extraction step. Feature extraction utilizes a circular polar grid feature set. Once the salient features have been extracted, indexing of historical Arabic manuscripts using LSI is performed in support of content-based image retrieval (CBIR).
申请公布号 US2014164370(A1) 申请公布日期 2014.06.12
申请号 US201213712773 申请日期 2012.12.12
申请人 KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS 发明人 YAHIA MOHAMMAD HUSNI NAJIB;AL-KHATIB WASFI G.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for retrieval of Arabic historical manuscripts, comprising the steps of: entering Arabic historical manuscript images into a computer for processing; extracting circular polar grid features from the Arabic historical manuscript images stored in the computer; constructing a Latent Semantic Index based on the extracted circular polar grid features, the Latent Semantic Index having a reduced dimension m×n Term-by-Document matrix obtained from a Singular Value Decomposition of a higher dimensional Term-by-Document matrix constructed by the computer from the extracted circular polar grid features, wherein m rows represent the features and n columns represent the images; accepting a user query against the stored Arabic historical manuscript images, the computer forming the user query as a query vector derived from features extraction of a query image supplied by the user; performing query matching based on comparison between the query vector and the Term-by-Document matrix; and displaying Arabic historical document images returned by the query matching process performed by the computer, the returned document images being ranked by similarity to the user query according to a predetermined distance measurement between the query vector and the Term-by-Document matrix.
地址 Dhahran SA