发明名称 Techniques for unsupervised web content discovery and automated query generation for crawling the hidden web
摘要 Unsupervised crawling of the hidden Web utilizes a query engine, coupled to a crawler system, that automatically and intelligently inserts keywords into text input controls in Web page forms so that the filled form can be submitted to a server to retrieve dynamically generated Web content for indexing. The keywords used to fill form controls are based on the content of corresponding Web pages, which is automatically discovered to generate a set of keywords for filling the controls. The set of keywords can be expanded to include related keywords from other Web pages and Web sites and, therefore, to provide more effective coverage for crawling the Web content. The expanded set of keywords can be continuously expanded by recursively performing similarity analyses based on results from crawling the same and other Web sites.
申请公布号 US2007022085(A1) 申请公布日期 2007.01.25
申请号 US20050224887 申请日期 2005.09.12
申请人 KULKARNI PARASHURAM 发明人 KULKARNI PARASHURAM
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址