发明名称 System and method for disambiguating non diacritized arabic words in a text
摘要 The present invention proposes a solution to the problem of word lexical disambiguation in Arabic texts. This solution is based on text domain-specific knowledge, which facilitates the automatic vowel restoration of modern standard Arabic scripts. Texts similar in their contents, restricted to a specific field or sharing a common knowledge can be grouped in a specific category or in a specific domain (examples of specific domains; sport, art, economic, science . . . ). The present invention discloses a method, system and computer program for lexically disambiguating non diacritized Arabic words in a text based on a learning approach that exploits; Arabic lexical look-up, and Arabic morphological analysis, to train the system on a corpus of diacritized Arabic text pertaining to a specific domain. Thereby, the contextual relationships of the words related to a specific domain are identified, based on the valid assumption that there is less lexical variability in the use of the words and their morphological variants within a domain compared to an unrestricted text.
申请公布号 US2006129380(A1) 申请公布日期 2006.06.15
申请号 US20050299220 申请日期 2005.12.09
申请人 EL-SHISHINY HISHAM 发明人 EL-SHISHINY HISHAM
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址