发明名称 Eliminating noise in periodicals
摘要 A method and system for eliminating global and local noise in periodical items is described. An exemplary method may include preprocessing each item in a set of items using one or more rules, removing global noise from the set of items using semantic similarities across items in the set of items, and removing local noise in each item in the set of items based on text content in each item.
申请公布号 US9251228(B1) 申请公布日期 2016.02.02
申请号 US201113092098 申请日期 2011.04.21
申请人 Amazon Technologies, Inc. 发明人 Iyer Deepa Subramanian;Dass Ramya
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 Lowenstein Sandler LLP 代理人 Lowenstein Sandler LLP
主权项 1. A method, comprising: preprocessing, by a server computer system, each item in a set of items using one or more rules, wherein preprocessing an item in the set of items comprises: determining that the item includes a print option; andusing a version of the item associated with the print option instead of an alternate version; removing, by the server computer system, global noise from the set of items using semantic similarities across items in the set of items; and removing, by the server computer system, local noise in the item of the set of items, wherein removing the local noise in the item of the set of items comprises: determining an amount of content for a node associated with the item;calculating a content score for the node based on the amount of content;calculating a link density for the node based on a number of links in the node as a percentage of the content;calculating a local noise score for the node based on the content score and the link density; andremoving the node responsive to a determination that the local noise score is above a threshold.
地址 Reno NV US