发明名称 |
COMPUTER-IMPLEMENTED SYSTEMS AND METHODS FOR TAXONOMY DEVELOPMENT |
摘要 |
Systems and methods are provided for generating a set of classifiers. A term is identified within a document and a pre-defined threshold distance is determined. A plurality of additional terms in the document are identified, the additional terms being located within the pre-defined threshold distance of the time. A distance between the term and an additional term of the plurality of additional terms is calculated. A corresponding weight for the calculated distance is determined using a proximity weighting scheme. A score for the additional term is calculated using the calculated distance and the corresponding weight. A colocation matrix is generated and a classifier determined using the colocation matrix. |
申请公布号 |
US2015317390(A1) |
申请公布日期 |
2015.11.05 |
申请号 |
US201514798320 |
申请日期 |
2015.07.13 |
申请人 |
SAS Institute Inc. |
发明人 |
Mills Bruce Monroe;Haws John Courtney;Brocklebank John Clare;Lehman Thomas Robert |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A system, comprising:
one or more processors; one or more non-transitory computer readable storage mediums containing instructions to cause the one or more processors to perform operations including:
identifying a term within a document;determining a pre-defined threshold distance;identifying a plurality of additional terms in the document, wherein the plurality of additional terms are located within the pre-defined threshold distance of the term;calculating a distance between the term and an additional term of the plurality of additional terms;determining a corresponding weight for the calculated distance, wherein determining the corresponding weight uses a proximity weighting scheme;calculating a score for the additional term using the calculated distance and the corresponding weight;generating a colocation matrix including a plurality of rows, wherein the colocation matrix is generated using the term, the plurality of additional terms, and the score; anddetermining a classifier for the document using the colocation matrix. |
地址 |
Cary NC US |