发明名称 METHOD AND DEVICE FOR ACQUIRING MOVIE AND TELEVISION SUBJECT FROM WEBPAGE
摘要 <p>The present invention relates to the field of network data communications, and disclosed are a method and device for acquiring a movie and television subject from a webpage. The method comprises: extracting texts to be mined from a webpage to be subjected to movie and television subject mining, and according to a preset segmentation rule, segmenting the texts to be mined, so as to obtain a sentence set; in the sentence set, extracting the longest common clause from all the sentences as a candidate movie and television subject; and according to the candidate movie and television subject, determining a movie and television subject of the webpage. The present invention can avoid a complicated process of compiling a wrapper when a large number of websites needing to be parsed exist in the prior art, and can also overcome the defect of parsing failure due to the fact that the change of a webpage structure cannot be detected in real time when the webpage structure frequently changes.</p>
申请公布号 WO2015024429(A1) 申请公布日期 2015.02.26
申请号 WO2014CN83077 申请日期 2014.07.25
申请人 BEIJING QIHOO TECHNOLOGY COMPANY LIMITED;QIZHI SOFTWARE (BEIJING) COMPANY LIMITED 发明人 SUN, LIN;CHEN, PEIJUN;QIN, JISHENG
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址