发明名称 Web browser device for structured data extraction and sharing via a social network
摘要 A method and system for implementing a browser based information extraction and transmission method. A method and system for identifying, extracting, and transmitting predefined structured information from web pages browser interface. The extracted information is then added to a user profile on a social network and a database. The information is shared with other users who can comment, copy, vote on, or go to the original information source. The information can be combined with other extracted information to form collections for the purposes of voting on one or more items in the collection, combining multiple items to form a useful kit, saving information for later use, adding addition information such as dates and purchase location for personal inventory purposes, and for saving bookmarks to structured data.
申请公布号 US9606970(B2) 申请公布日期 2017.03.28
申请号 US201313734916 申请日期 2013.01.04
申请人 Data Record Science 发明人 Pappas Derek Edwin;Vujovic Dragan
分类号 G06F17/00;G06F17/22;G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项 1. A method for extracting structured data from a web page using a web browser device, said method comprising: a first user accessing said web page with a web browser, wherein the web page displays a product and a plurality of data field values, wherein the web page belongs to a web site; the first user activating said web browser device, wherein the browser device comprises at least one of a widget, a button, or a browser extension; the first user manually choosing and selecting a first data field value from the plurality of data field values; the first user manually associating a first data field name with said first data field value using a menu; automatically associating a widget data field name with said first data field name; calculating an XPATH value of said first data field value on said web page, wherein a widget extraction engine calculates the XPATH value from a root of the web page markup to a data item corresponding to the first data field value; creating, using said web browser device, a template comprising said widget data field name, a data element type corresponding to the first data field value, a boolean if the first data field value is a constant and should not be extracted, and if constant, a substitute data value to substitute for a corresponding page value in future extractions of corresponding page layout type from the web site, and said XPATH value; selecting elements of an attribute list with name value pairs and XPATH values associated with a data record to identify a repeating structured pattern associated with said attribute list with name value pairs and XPATH values, wherein the widget extraction engine calculates the XPATH values from the root of the markup page to data items corresponding to the elements; storing said template in a first data store with a key, wherein the key is a root URL for the web site corresponding to the web page; storing said first data field value and said first data field name in a second data store wherein there is an association between the first data field name in said first data store and said second data store; converting said template from said first data store into an automatic data extraction template to extract current data field values from all web pages at the web site which match said template; cleaning said first data field value, classifying said first data field value, normalizing said first data field value, storing said first data field value and indexing said first data field value; automatically extracting a structured data record from a web page using the automatic data extraction template; adding said structured data record to a user profile on a social network; and sharing said structured data record with a plurality of users wherein each of said plurality users can comment, copy, vote on, or access an original structured data record source.
地址 Cheyenne WY US
您可能感兴趣的专利