发明名称 Identifying salient items in documents
摘要 A set of representations of item-page pairs of items and respective web pages that include the respective items is obtained, each representation including feature function values indicating weights associated with features of associated web pages, the features including page classification features. An annotated set of labeled training data that is annotated with salience annotation values of items for respective web pages that include the items is obtained. The salience annotation values are determined based on a soft function, by determining a first count of a total number of user queries associated with corresponding visits to the respective web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with user queries that include the item, the subset included in the corresponding visits. Models are trained using the annotated set.
申请公布号 US9251473(B2) 申请公布日期 2016.02.02
申请号 US201313798198 申请日期 2013.03.13
申请人 Microsoft Technology Licensing, LLC 发明人 Gamon Michael;Pantel Patrick;Song Xinying;Yano Tae;Apacible Johnson Tan
分类号 G06N99/00;G06F17/30 主分类号 G06N99/00
代理机构 代理人 Wight Steve;Swain Sandy;Minhas Micky
主权项 1. A system comprising: a device that includes at least one processor, and a computer readable storage medium storing instructions for execution by the at least one processor, for implementing a salient item identification engine that: obtains query data and corresponding click data that indicates web pages visited, in association with respectively corresponding user queries, based on information mined from a web search log; anddetermines a salience annotation value of an item for respective ones of the web pages, based on determining a first count of a total number of the user queries that are associated with one or more corresponding visits to the respective ones of the web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with a group of the user queries that include the item, the subset included in the one or more corresponding visits.
地址 Redmond WA US