发明名称 ONLINE INTERNET TOPIC MINING METHOD BASED ON IMPROVED LDA MODEL
摘要 Disclosed is an online Internet topic mining method based on an improved LDA model. The method corresponds to a continuous and streaming type topic mining process conducted in a segmented mode, n web pages are processed each time, and these web pages are usually acquired by web crawlers from the Internet in an online and real-time mode, and mining results of the contents of these web pages generate k topics. After current n web pages are processed, newly acquired n web pages are continuously processed through the process. The process mainly comprises initialization of On-LDA model hyper-parameters, dynamic updating of the On-LDA model hyper-parameters, Internet topic mining based on the On-LDA model and the like. By means of the present invention, the assignment method and effect of use in respect to the hyper-parameters and of a traditional LDA model in the topic mining process are radically changed. Classified information to which the web page contents belong is fully utilized to assign initial values to the model hyper-parameters, so that the initial values of the hyper-parameters completely depend on the web page contents to be mined, and the computing process is simplified and rationality is achieved.
申请公布号 WO2017035922(A1) 申请公布日期 2017.03.09
申请号 WO2015CN92047 申请日期 2015.10.16
申请人 YANG, Peng 发明人 YANG, Peng;LU, Yuncheng;DONG, Yongqiang
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址