发明名称 METHOD AND APPARATUS FOR IDENTIFYING GARBAGE TEMPLATE ARTICLE
摘要 Method and apparatus for identifying garbage template articles in network communication field are disclosed. The method includes: extracting a feature from an eligible microblog article to generate an article feature including a punctuation feature, a topic feature, a bracket feature, a link feature and an account name feature; acquiring a garbage template list including garbage template feature, i.e. an article feature whose frequency reaches a preset threshold, wherein they are extracted in a same way; identifying the microblog article as a garbage template article when the article feature is the same as the garbage template feature. The apparatus includes: a feature extracting module, an acquiring module, and an identifying module. Features of a microblog article are extracted to determine whether the microblog article is a garbage template article, so that garbage template articles in the present microblog platform can be identified effectively and search engine resources are saved.
申请公布号 US2015227497(A1) 申请公布日期 2015.08.13
申请号 US201314428314 申请日期 2013.09.17
申请人 TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED 发明人 Hao Zhixin;He Jianguo;Zhang Guoqiang;He Xiaochen
分类号 G06F17/22;H04L29/08;G06F17/30 主分类号 G06F17/22
代理机构 代理人
主权项 1. A method for identifying garbage template article, comprising: extracting a feature from an eligible microblog article to generate an article feature, wherein the article feature comprises at least a punctuation feature, a topic feature, a bracket feature, a link feature and an account name feature; acquiring a garbage template list which comprises garbage template feature, the garbage template feature being an article feature whose frequency reaches a preset threshold, and the way to extract the garbage template feature being the same as the way to extract the article feature; and identifying the microblog article as a garbage template article when the article feature is the same as the garbage template feature in the garbage template list, wherein the eligible microblog article is a microblog article which is in an original form and contains link and picture, and before extracting a feature from an eligible microblog article, the method further comprises: removing numbers and letters from the eligible microblog article, and removing the contents in various brackets from the microblog article while retaining the brackets.
地址 Shenzhen, Guangdong CN