ADAPTIVE DOCUMENT SAMPLING FOR INFORMATION EXTRACTION,申请号US20090398162-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	ADAPTIVE DOCUMENT SAMPLING FOR INFORMATION EXTRACTION
摘要	A method and apparatus for improved sampling documents for training sets input to information extraction systems is provided, which improves the recall and robustness of wrapper extraction. A passive sampling technique provides a list of documents to present for human annotation ordered by representativeness of the document based on structural and content statistics. Thus, the document with the most interesting attributes and which is most representative of the cluster of structurally similar documents to which the document pertains is presented for annotation first. The problem is mapped to classical ‘Set-Cover’ problem and solved using greedy approach. An active sampling technique refines and reorders the sample list produced by the passive sampling technique after initial annotations, based on the human annotation, spatial boundaries of the documents, and structural and content statistics. The proposed techniques work at a site level and perform page-level structural analysis using XPath-term frequency, XPath-document frequency, and XPath-importance.
申请公布号	US2010228738(A1)	申请公布日期	2010.09.09
申请号	US20090398162	申请日期	2009.03.04
申请人	MEHTA RUPESH R;SENGAMEDU SRINIVASAN H	发明人	MEHTA RUPESH R.;SENGAMEDU SRINIVASAN H.
分类号	G06F17/30;G06F17/21	主分类号	G06F17/30
代理机构		代理人
主权项
地址

您可能感兴趣的专利

Methods, systems and computer program products for triggered data collection and correlation of status and/or state in distributed data processing systems

SEMICONDUCTOR MEMORY ELEMENT

OPTICAL COMMUNICATIONS TRANSCEIVER AND METHOD FOR TRANSCEIVING DATA

METHOD FOR PROVIDING A VIDEO DATA STREAMING SERVICE

LIGHT GUIDING PANEL FORMED WITH MINUTE RECESSES BY A SAND BLASTING PROCESS AND A BACKLIGHT UNIT USING THE SAME

METHOD FOR PATH MTU DISCOVERY ON IP NETWORK AND APPARATUS THEREOF

METALLURGICAL IMPACT PAD

Device for correcting signal modulations

Holographic storage lenses

ANODE ELECTROCATALYSTS FOR COATED SUBSTRATES USED IN FUEL CELLS

CERIA-BASED MIXED-METAL OXIDE STRUCTURE, INCLUDING METHOD OF MAKING AND USE

Braking control system for a washing machine

EXPOSURE CONTROL FOR PHASE SHIFTING PHOTOLITHOGRAPHIC MASKS

Semiconductor device for memory test with changing address information

Access control method utilizing a key battery

Individual memory page activity timing method and system

Semiconductor memory device having select circuit

Adjustable chair for vehicles

Universal document processor for merging continuos and cut sheet documents into sets

Load indcating fastener insert