发明名称 Linguistic based determination of text creation date
摘要 A method includes receiving a text. The method also includes identifying a set of linguistic characteristics contained in the text. The method also includes determining a plurality of time periods in which the text was potentially written based on the set of linguistic characteristics. The method also includes retrieving a set of reference documents for each time period. The method also includes producing a set of proximity scores by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents, and the text, where the proximity checks analyze how often and how close linguistic characteristics are to one another. The method also includes ranking the plurality of time periods based on the set of proximity scores and returning a set of one or more ranked time periods of the plurality of time periods.
申请公布号 US9639524(B2) 申请公布日期 2017.05.02
申请号 US201514835904 申请日期 2015.08.26
申请人 International Business Machines Corporation 发明人 Allen Corville O.;DeLima Roberto;Freed Andrew R.;Nielsen Robert L.
分类号 G06F17/20;G06F17/28;G06F17/27;G06F17/30 主分类号 G06F17/20
代理机构 代理人 Rau Nathan M.
主权项 1. A computer implemented natural language processing method, comprising: receiving a text; identifying a set of linguistic characteristics contained in the text, wherein linguistic characteristics include grammatical, syntactic, and idiomatic features of the text; determining a plurality of time periods in which the text was potentially written based on the set of linguistic characteristics; retrieving a set of reference documents for each time period in the plurality of time periods in response to the determining the plurality of time periods in which the text was potentially written; producing a set of proximity scores for the text by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents for each time period, and the text, wherein the proximity checks analyze a usage frequency of the set of linguistic characteristics and a temporal closeness of the plurality of time periods in the set of linguistic characteristics between the text and the set of reference documents for each time period are to one another; ranking the plurality of time periods based on the set of proximity scores; and returning a set of one or more ranked time periods of the plurality of time periods.
地址 Armonk NY US