EXTRACTING DATA CONTENT ITEMS USING TEMPLATE MATCHING,申请号US20070848987-传众专利搜索

发明名称	EXTRACTING DATA CONTENT ITEMS USING TEMPLATE MATCHING
摘要	Systems and methods for extracting data content items from a web page are provided. A template is created by labeling data content items of interest associated with a web page and generating a template Document Object Model (DOM) tree based on the labeled web page. DOM trees are also generated for additional web pages that contain data content items for which extraction may be desired. These DOM trees are compared to the template DOM tree to determine alignment there between. The aligned data content items may then be extracted from the additional web pages and indexed, as desired. Labeling the data content items of interest prior to generating a template DOM tree allows for the desired data content items to be specified and more accurately extracted from related and/or similarly structured web pages.
申请公布号	US2009063500(A1)	申请公布日期	2009.03.05
申请号	US20070848987	申请日期	2007.08.31
申请人	MICROSOFT CORPORATION	发明人	ZHAI YANHONG;LI YI;QIAN RICHARD;GAO HONG;TAN LEI
分类号	G06F17/30	主分类号	G06F17/30
代理机构		代理人
主权项
地址