发明名称 Systems and methods for identifying and extracting data from HTML pages
摘要 Systems and methods for analyzing HTML formatted web pages to automatically identify and extract desired information. A computer algorithm identifies and extracts different pieces of information from different web pages automatically after minimal manual setup. The algorithm automatically analyzes pages with different content if they have the same, or similar, formats. The algorithm is fast and efficient and performs the extraction process quickly in real-time. The systems and methods are useful to build databases from unstructured web information. The algorithm can be used as an agent that captures information about products, and compares prices or other characteristics. It can also be used to populate structured databases that, given the different pieces of information, can analyze products and their characteristics. And it can also be used for data mining applications looking for patterns useful for marketing analyses, or other uses.
申请公布号 US2005273706(A1) 申请公布日期 2005.12.08
申请号 US20050122992 申请日期 2005.05.04
申请人 YAHOO! INC. 发明人 MANBER UDI;LU QI
分类号 G06F17/00;G06F17/30;(IPC1-7):G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利