发明名称 |
Lemmatizing, stemming, and query expansion method and system |
摘要 |
A method of stemming text and system therefore are described. The method comprises removing stop words from a document based on at least one stop word entry in an array of stop words and flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by an verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns; applying a root-based stemming on the flagged verbs; and storing the stemmed document.
|
申请公布号 |
US8473279(B2) |
申请公布日期 |
2013.06.25 |
申请号 |
US20090476238 |
申请日期 |
2009.06.01 |
申请人 |
AL-SHAMMARI EIMAN TAMAH |
发明人 |
AL-SHAMMARI EIMAN TAMAH |
分类号 |
G06F17/20;G06F17/21;G06F17/27;G06F17/28 |
主分类号 |
G06F17/20 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|