发明名称 CONCEPTUAL DOCUMENT ANALYSIS AND CHARACTERIZATION
摘要 Data files are received from data sources that include textual content. The data files are categorized using a taxonomy of categories, where each category has sample textual content that defines a concept for the category. The categorizing includes comparing the textual content of the data file with the sample textual content for the category. A file score is calculated for each data file to compare the degree of similarity between the defined concept of the category and a determined concept for the data file. Each data file is associated with the category if the file score is equal to or greater than a pre-determined minimum score for the category. A portion of the data file and/or file score is be provided.
申请公布号 US2016328454(A1) 申请公布日期 2016.11.10
申请号 US201615215470 申请日期 2016.07.20
申请人 Altep, Inc. 发明人 Miller Roger W.;van den Berge Willem R.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: receiving a plurality of data files from a plurality of data sources that comprise textual content; categorizing the plurality of data files using a taxonomy of categories in which each category has associated sample textual content defining a concept for the category, the categorizing comprising, for each category: comparing, for each of the plurality of data files, the textual content of the data file with the sample textual content for the category;calculating, based on the comparing and for each of the plurality of data files, a file score corresponding to the degree of similarity between the defined concept of the category and a determined concept for the data file; andassociating, for each of the plurality of data files, the data file with the category if the file score is equal to or greater than a pre-determined minimum score for the category; and providing at least a portion of the data file and/or the associated file score.
地址 El Paso TX US