发明名称 COMPUTER-IMPLEMENTED SYSTEMS AND METHODS FOR TAXONOMY DEVELOPMENT
摘要 Systems and methods are provided for generating a set of classifiers. A term is identified within a document and a pre-defined threshold distance is determined. A plurality of additional terms in the document are identified, the additional terms being located within the pre-defined threshold distance of the time. A distance between the term and an additional term of the plurality of additional terms is calculated. A corresponding weight for the calculated distance is determined using a proximity weighting scheme. A score for the additional term is calculated using the calculated distance and the corresponding weight. A colocation matrix is generated and a classifier determined using the colocation matrix.
申请公布号 US2015317390(A1) 申请公布日期 2015.11.05
申请号 US201514798320 申请日期 2015.07.13
申请人 SAS Institute Inc. 发明人 Mills Bruce Monroe;Haws John Courtney;Brocklebank John Clare;Lehman Thomas Robert
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A system, comprising: one or more processors; one or more non-transitory computer readable storage mediums containing instructions to cause the one or more processors to perform operations including: identifying a term within a document;determining a pre-defined threshold distance;identifying a plurality of additional terms in the document, wherein the plurality of additional terms are located within the pre-defined threshold distance of the term;calculating a distance between the term and an additional term of the plurality of additional terms;determining a corresponding weight for the calculated distance, wherein determining the corresponding weight uses a proximity weighting scheme;calculating a score for the additional term using the calculated distance and the corresponding weight;generating a colocation matrix including a plurality of rows, wherein the colocation matrix is generated using the term, the plurality of additional terms, and the score; anddetermining a classifier for the document using the colocation matrix.
地址 Cary NC US