发明名称 Dynamic corpus generation
摘要 A computer-implemented method of generating a dynamic corpus includes generating web threads, based upon corresponding sets of words dequeued from a word queue, to obtain web thread resulting URLs. The web thread resulting URLs are enqueued in a URL queue. Multiple text extraction threads are generated, based upon documents downloaded using URLs dequeued from the URL queue, to obtain text files. New words are randomly obtained from the text files, and the randomly obtained words from the text files are enqueued in the word queue. This process is iteratively performed, resulting in a dynamic corpus.
申请公布号 US2007106977(A1) 申请公布日期 2007.05.10
申请号 US20050270014 申请日期 2005.11.09
申请人 MICROSOFT CORPORATION 发明人 ARGUELLES CARLOS A.
分类号 G06F9/44 主分类号 G06F9/44
代理机构 代理人
主权项
地址