发明名称 Method and apparatus for tracking a change in a collection of web documents
摘要 A method and an apparatus for tracking changes in a collection of web documents, for example, provided by a web site. The web documents are retrieved at a first assigned point in time and a second assigned point in time. Then a similarity measure for a combination of a retrieved web document at a first assigned point in time and a retrieved web document at a second assigned point in time is calculated for determining pairs of corresponding web documents. By comparing said calculated similarity measure of a pair of corresponding web documents with predetermined thresholds for the similarity measure a change in the content of the corresponding web document between the first assigned point in time and second assigned point in time is detected. Instead of referring to identifiers like URLs for web pages the content similarities of web pages are considered. The proposed strategy facilitates the work of marketing analysts.
申请公布号 US2009204595(A1) 申请公布日期 2009.08.13
申请号 US20080027316 申请日期 2008.02.07
申请人 SIEMENS ENTERPRISE COMMUNICATIONS GMBH & CO. KG 发明人 DOMBROWSKI BERNHARD;KLUG KARL;SKUBACZ MICHAL;SUDA PETER;TOTZKE JURGEN;ZIEGLER CAI-NICOLAS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址