一种基于本体思想的网页信息提取方法,申请号CN201610500057.7-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	一种基于本体思想的网页信息提取方法
摘要	本发明公开了一种基于本体思想的网页信息提取方法，该方法采用向量空间模型，首先对网页分词结果进行分析得出特征词，其次计算特征权值，然后结合本体思想分析网页的主题相关度，最后采取主题相关度与系统设定的阈值进行比较，从而提取该网页的主题信息。此方法使网页分析的运算量降低，减少网页信息的遗漏，提高了信息提取的质量。
申请公布号	CN106096055A	申请公布日期	2016.11.09
申请号	CN201610500057.7	申请日期	2016.06.28
申请人	合肥酷睿网络科技有限公司	发明人	董雄飞
分类号	G06F17/30(2006.01)I;G06F17/27(2006.01)I	主分类号	G06F17/30(2006.01)I
代理机构		代理人
主权项	一种基于本体思想的网页信息提取方法，其特征在于，包括如下步骤：(1)网页文档预处理将待抽取信息的网页作为信息源，主题爬虫对网页锚文本、网页标题、正文标题和正文以标签树的方式进行结构化分析，处理成网页文本；(2)本体化分类利用分词系统FreeICTCLAS的接口进行分词，并对词语进行本体化分类，同时得到特征词在文本中出现的频率；(3)权值计算根据向量空间模型，将每个网页文本抽象成一个向量，接着通过公式将文本的特征关键词所占的权值计算出来，所述公式为Wi＝∑(WtPtWi)；(4)计算主题相关度根据主题相关度公式<img file="FDA0001033403010000011.GIF" wi="1067" he="237" />分析主题相关度；(5)分析主题相关度将计算得到的主题相关度与系统设置的阈值进行比较。
地址	230000 安徽省合肥市安徽省合肥经开区桃花工业园翡翠大道学林雅苑A1幢1501室

您可能感兴趣的专利

Three dimensional (3D) display terminal apparatus and operating method thereof

Method and apparatus for automatic removal of carbon deposits from the oven chambers and flow channels of non-recovery and heat-recovery coke ovens

Moisture-hardening compositions containing silane-functional polymers and aminosilane adducts

Vessel sealing instrument with electrical cutting mechanism

Device and method for dispersing two solutions in each other in solvent extraction

Signal Transmission Management System

System and method of triggering and executing active content on a recipient device

Methods for storing carbon dioxide compositions in subterranean geological formations and an apparatus for carrying out the method

Combination treatment for rosacea

Single-molecule PCR on microparticles in water-in-oil emulsions

Water jet pool cleaner with opposing dual propellers

ESCANER DE CODIGOS DE BARRAS

Cathepsin C inhibitors

Elementos de tornillo sin fin con ángulo de cresta reducido

Method for measuring the position of a remarkable point of the eye of a subject along the horizontal direction of the sagittal plane

Adhesive composition for removable pressure sensitive adhesive label

ANTI-ALLERGIC AGENT

QUATERNIZED COPOLYMER