发明名称 Enriching website content with extracted feature multi-dimensional vector comparison
摘要 A method for enriching contents of a website includes obtaining a corpus from the current website and other websites, and extracting object features from the corpus, wherein the corpus comprises specifications of the object and user reviews about the object; according to the corpus, constructing multi-dimensional vectors for the extracted features; for a specified feature, making similarity comparison of its multi-dimensional vector and multi-dimensional vectors of other extracted features; determining features with similarities higher than a predetermined threshold as the same features, and reinforcing the current website with features different from that of the object on the current website and their corresponding attributes.
申请公布号 US9342491(B2) 申请公布日期 2016.05.17
申请号 US201313967871 申请日期 2013.08.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Bao Sheng Hua;Cai Ke Ke;Guo Hong Lei;Su Zhong;Wu Xian;Zhang Li;Zhang Shuo
分类号 G06F17/00;G06F17/22;G06Q30/02;G06Q30/06;G06Q30/00;G06F17/27 主分类号 G06F17/00
代理机构 Cantor Colburn LLP 代理人 Cantor Colburn LLP
主权项 1. An apparatus for enriching contents of a current website, the apparatus comprising a memory and a processor, the processor configured to: obtain a corpus from the current website and other websites, and extracting object features from the corpus, wherein the corpus comprises specifications of the object and user reviews about the object; construct multi-dimensional vectors for the extracted features according to the corpus; for a specific feature, make similarity comparison of its multi-dimensional vector and multi-dimensional vectors of other extracted features, wherein the similarity comparison comprises: calculation of mutual information between the specific feature in the corpus and each dimension of its multi-dimensional vector as a weight of each dimension; andcalculation of similarities between the multi-dimensional vector of the specific feature in the corpus and the multi-dimensional vectors of the extracted features according to the weights of each dimension, andwherein, the similarities between the multi-dimensional vectors of are calculated based on Euclidean distance; and determine features with similarities being higher than a predetermined threshold as a same feature, and reinforcing the current website with at least one feature which is different from those on the website and corresponding attributes of the at least one feature.
地址 Armonk NY US