发明名称 Corpus expansion system and method thereof
摘要 A system and method for expanding new sample seeds to automatically expand corpora, in which sample seeds are used to collect corpus is provided. The new sample seeds are generated based on the already existed sample seeds and collected corpora; The corpus expansion strategy is determined based on all the sample seeds having been used and new sample seeds: The new sample seeds are refmed based on the corpus expansion strategy, and the refmed new sample seeds are used to further collect corpus. The above steps are repeatedly executed until predefined condition is satisfied. According to the invention, corpus may be automatically expanded from the web or other resources with low cost and in convenient way to improve the coverage of corpora.
申请公布号 US2007073534(A1) 申请公布日期 2007.03.29
申请号 US20060511750 申请日期 2006.08.29
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 GUO HONG L.;ZHANG LI;QIU ZHAO M.;SHEN LI QIN;GUO ZHI L.
分类号 G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址