发明名称 |
Systems and methods for improving feature ranking using phrasal compensation and acronym detection |
摘要 |
Systems and methods are disclosed for analyzing a set of documents by building a positive set histogram; selecting phrases from the positive set histogram; modifying the frequency statistics in the histogram using the selected phrases; identifying one or more potential phrase-acronym pairs; selecting a subset of phrase-acronym pairs from the potential pairs; adding a new feature for each selected phrase-acronym (phrase ∥ acronym) pair to a positive set histogram; determining a value for each new feature; identifying one or more child concepts based on an updated histogram; grouping the one or more child concepts; and determining a child concept group coverage for one or more documents.
|
申请公布号 |
US2005114130(A1) |
申请公布日期 |
2005.05.26 |
申请号 |
US20040888419 |
申请日期 |
2004.07.09 |
申请人 |
NEC LABORATORIES AMERICA, INC. |
发明人 |
JAVA AKSHAY;KLOCK BRIAN;GLOVER ERIC J.;SHANBHAG VISHAL;KROVETZ ROBERT |
分类号 |
G06F17/30;G10L15/12;(IPC1-7):G10L15/12 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|