发明名称 |
Enriching website content with extracted feature multi-dimensional vector comparison |
摘要 |
A method for enriching contents of a website includes obtaining a corpus from the current website and other websites, and extracting object features from the corpus, wherein the corpus comprises specifications of the object and user reviews about the object; according to the corpus, constructing multi-dimensional vectors for the extracted features; for a specified feature, making similarity comparison of its multi-dimensional vector and multi-dimensional vectors of other extracted features; determining features with similarities higher than a predetermined threshold as the same features, and reinforcing the current website with features different from that of the object on the current website and their corresponding attributes. |
申请公布号 |
US9342491(B2) |
申请公布日期 |
2016.05.17 |
申请号 |
US201313967871 |
申请日期 |
2013.08.15 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
Bao Sheng Hua;Cai Ke Ke;Guo Hong Lei;Su Zhong;Wu Xian;Zhang Li;Zhang Shuo |
分类号 |
G06F17/00;G06F17/22;G06Q30/02;G06Q30/06;G06Q30/00;G06F17/27 |
主分类号 |
G06F17/00 |
代理机构 |
Cantor Colburn LLP |
代理人 |
Cantor Colburn LLP |
主权项 |
1. An apparatus for enriching contents of a current website, the apparatus comprising a memory and a processor, the processor configured to:
obtain a corpus from the current website and other websites, and extracting object features from the corpus, wherein the corpus comprises specifications of the object and user reviews about the object; construct multi-dimensional vectors for the extracted features according to the corpus; for a specific feature, make similarity comparison of its multi-dimensional vector and multi-dimensional vectors of other extracted features, wherein the similarity comparison comprises:
calculation of mutual information between the specific feature in the corpus and each dimension of its multi-dimensional vector as a weight of each dimension; andcalculation of similarities between the multi-dimensional vector of the specific feature in the corpus and the multi-dimensional vectors of the extracted features according to the weights of each dimension, andwherein, the similarities between the multi-dimensional vectors of are calculated based on Euclidean distance; and determine features with similarities being higher than a predetermined threshold as a same feature, and reinforcing the current website with at least one feature which is different from those on the website and corresponding attributes of the at least one feature. |
地址 |
Armonk NY US |