发明名称 Generating training documents
摘要 A method of generating training documents for training a classifying device comprises, with a processor, sampling from a distribution of words in a number of original documents, and creating a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents. A device for classifying textual documents comprises a processor; and a memory communicatively coupled to the processor, the memory comprising a sampling module to, when executed by the processor, determine the distribution of words in a number of original documents, a pseudo-document creation module to, when executed by the processor, create a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a similar distribution of words as the original documents, and a training module to, when executed by the processor, train the device to classify textual documents based on the pseudo-documents.
申请公布号 US9165258(B2) 申请公布日期 2015.10.20
申请号 US201213709773 申请日期 2012.12.10
申请人 Hewlett-Packard Development Company, L.P. 发明人 Deolalikar Vinay;Laffitte Hernan
分类号 G06N99/00;G06K9/46;G06K9/00 主分类号 G06N99/00
代理机构 Trop, Pruner & Hu, P.C. 代理人 Trop, Pruner & Hu, P.C.
主权项 1. A method of generating training documents for training a classifying device comprising, with at least one processor: sampling from a distribution of words in a number of original documents; and creating a number of pseudo-documents from the distribution of words, the pseudo-documents comprising a same distribution of words as the original documents, wherein creating the number of pseudo-documents comprises creating a pseudo-document of the number of pseudo-documents based on a combined distribution of words from at least two original documents.
地址 Houston TX US