发明名称 Snapshot-based screen scraping
摘要 A method is provided for scraping information from a web page or other page of electronic content. As opposed to existing methods in which an entire page's HTML (HyperText Markup Language) code or DOM (Document Object Model) tree is parsed and pattern-matched, in the provided method only specific regions of interest are examined closely. An image snapshot of the page is created and investigated using routines for identifying regions of interest (e.g., paragraphs of text, faces). Regions comprising text are then converted into text using OCR (Optical Character Recognition) technology or a similar tool, and the resulting text can then be scanned for symbols, words or phrases of interest.
申请公布号 US8306255(B1) 申请公布日期 2012.11.06
申请号 US20080200416 申请日期 2008.08.28
申请人 DEGNAN OLIVER;INTUIT INC. 发明人 DEGNAN OLIVER
分类号 G06K9/00;G06F17/00 主分类号 G06K9/00
代理机构 代理人
主权项
地址