发明名称 |
Document analysis and multi-word term detector |
摘要 |
A term analyzer receives an ordered collection of text-based terms. The ordered collection can contain terms from a document that have been filtered to remove“noise”such as stopwords. The term analyzer analyzes groupings of consecutive text-based terms in the ordered collection to identify occurrences of different combinations of text-based terms in the ordered collection. In addition, the term analyzer maintains frequency information representing the occurrences of the different combinations of text-based terms in the collection. The frequency information can then be used to determine relatively significant keywords and/or keyword phrases in the document. In an example configuration, the term analyzer creates a tree in which a first term in a given grouping of the groupings is defined as a parent node in the tree and a second term in the given grouping is defined as a child node of the parent node in the tree. The method of the analyzer generalizes to create a tree of multi-word terms in which the terms can be efficiently ranked by occurrence.
|
申请公布号 |
US8090724(B1) |
申请公布日期 |
2012.01.03 |
申请号 |
US20070946637 |
申请日期 |
2007.11.28 |
申请人 |
WELCH MICHAEL J.;CHANG WALTER;ADOBE SYSTEMS INCORPORATED |
发明人 |
WELCH MICHAEL J.;CHANG WALTER |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|