发明名称 System and methods for determining term importance, search relevance, and content summarization
摘要 Computer-implemented methods for determining term importance and for search relevance ranking, document tagging and summarization are disclosed. The methods include steps for discovering prominent and information-rich terms by performing analysis on term attributes including structural and distributional attributes and grammatical attributes which include the subject and non-subject of a sentence, and head and modifier of a phrase. Methods for tagging documents with important terms and ranking search results based on prominence measures of queried terms are also disclosed.
申请公布号 US9460195(B1) 申请公布日期 2016.10.04
申请号 US201414292649 申请日期 2014.05.30
申请人 Zhang Guangsheng 发明人 Zhang Guangsheng
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for determining term importance in a text content for information discovery and presentation, comprising: receiving a text content; tokenizing the text content into a plurality of terms, each term comprising one or more words or phrases; identifying a first term in the text content, wherein the first term is or is contained in a grammatical subject of a sentence; identifying a second term in the text content, wherein the second term is or is contained in a non-subject portion of a sentence; using, by a computer, a pre-defined criterion for determining whether the first term is of greater importance than the second term, or whether the second term is of greater importance than the first term; associating a first importance measure to the first term based on the pre-defined criterion; associating a second importance measure to the second term based on the pre-defined criterion; further determining the first importance measure based on a comparison between a frequency of the first term inside the text content and a frequency of the first term obtained from a document external to the text content, or further determining the second importance measure based on a comparison between a frequency of the second term inside the text content and a frequency of the second term obtained from a document external to the text content; selecting the first term in preference over the second term if the first importance measure is greater than the second importance measure, or selecting the second term in preference over the first term if the second importance measure is greater than the first importance measure; performing an action associated with the selected term, wherein the action comprises at least outputting an element associated with the selected term, displaying an element associated with the selected term, or using an element associated with the selected term for a computer-assisted operation associated with the text content including ranking a search result, wherein the element is selected from the group consisting of at least the selected term, the first importance measure or the second importance measure, and a sentence or paragraph containing the selected term.
地址 Palo Alto CA US