发明名称 Entity category extraction for an entity that is the subject of pre-labeled data
摘要 Summaries of entities (e.g., people, places, things, concepts, etc.) may provide additional useful information to user. For example, a search engine may provide a summary of an entity within search results. A category (e.g., “writer”, “politician”, etc.) of the entity that is short and concise may be advantageous to provide within a summary of the entity. The category may allow a user to quickly determine whether the information of the entity relates to the intended entity (e.g., search results of an entity as “a writer” vs. search results of an entity as “a politician”). Potential categories and summary text may be extracted from pre-labeled data. The potential categories and summary text may be intersected to determine a set of candidate categories that may be ranked. An entity category having a desired ranked may be determined as the entity category that describes the entity in a desired way.
申请公布号 US9268878(B2) 申请公布日期 2016.02.23
申请号 US201012820349 申请日期 2010.06.22
申请人 Microsoft Technology Licensing, LLC 发明人 Bieniosek Michael;Salvetti Franco;Thione Giovanni Lorenzo
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人 Ream Dave;Ross Jim;Minhas Micky
主权项 1. A computer-implemented method executed by a processing unit coupled to memory for determining an entity category for an entity, the entity comprising a subject of a web page, comprising: extracting a set of potential categories relating to an entity, wherein the set of potential categories is extracted from a first portion of a web page and the entity comprises a subject of the web page; extracting summary text relating to the entity from a second portion of the web page, wherein the first portion is disposed at a first region of the web page and the second portion is disposed at a second region of the webpage separate from the first region; comparing at least some of the set of potential categories to the summary text to determine a set of candidate categories for the entity, wherein the set of candidate categories is a subset of the set of potential categories, and wherein the comparing includes A) performing a morphological analysis upon one or more category words of a potential category of the set of potential categories to generate a set of variation category words of the potential category, andB) identifying a match between one or more variation category words of the set of variation category words and one or more summary words of the summary text; ranking a first candidate category of the set of candidate categories relative to a second candidate category of the set of candidate categories based upon one or more ranking features to generate a ranked set of candidate categories; determining an entity category for the entity from the ranked set of candidate categories, wherein the entity category has a first rank within the ranked set of candidate categories, wherein the first rank is above a threshold; and presenting the entity category having the first rank in a search result page.
地址 Redmond WA US