摘要 |
Disclosed is an online Internet topic mining method based on an improved LDA model. The method corresponds to a continuous and streaming type topic mining process conducted in a segmented mode, n web pages are processed each time, and these web pages are usually acquired by web crawlers from the Internet in an online and real-time mode, and mining results of the contents of these web pages generate k topics. After current n web pages are processed, newly acquired n web pages are continuously processed through the process. The process mainly comprises initialization of On-LDA model hyper-parameters, dynamic updating of the On-LDA model hyper-parameters, Internet topic mining based on the On-LDA model and the like. By means of the present invention, the assignment method and effect of use in respect to the hyper-parameters and of a traditional LDA model in the topic mining process are radically changed. Classified information to which the web page contents belong is fully utilized to assign initial values to the model hyper-parameters, so that the initial values of the hyper-parameters completely depend on the web page contents to be mined, and the computing process is simplified and rationality is achieved. |