发明名称 LINGUISTIC BASED DETERMINATION OF TEXT LOCATION ORIGIN
摘要 A method and system for determining a location of origin and a time period in which a document was written is disclosed. A text is received and a set of linguistic characteristics for the text are identified. A set of possible locations and time periods for the text are determined based on the set of linguistic characteristics. A set of reference documents are used to determine a proximity rating for the text based upon a determination of how close the text is to the reference documents. The potential locations and time periods are ranked and returned for presentation.
申请公布号 US2017060832(A1) 申请公布日期 2017.03.02
申请号 US201615257070 申请日期 2016.09.06
申请人 International Business Machines Corporation 发明人 Allen Corville O.;DeLima Roberto;Freed Andrew R.;Nielsen Robert L.
分类号 G06F17/27;G06F17/22 主分类号 G06F17/27
代理机构 代理人
主权项 1. A computer implemented natural language processing method, comprising: receiving a text; identifying a set of linguistic characteristics contained in the text, wherein linguistic characteristics include grammatical, syntactic, and idiomatic features of the text; determining a plurality of locations of origin in which the text was potentially written based on the set of linguistic characteristics; retrieving a set of reference documents for each location of origin in the plurality of locations of origin, in response to the determining the plurality of locations in which the text was potentially written; determining a plurality of time periods in which the text was potentially written based on the set of linguistic characteristics; retrieving a set of reference documents for each time period in the plurality of time periods in response to the determining the plurality of time periods in which the text was potentially written; producing a set of proximity scores by performing a set of proximity checks using the set of linguistic characteristics, the set of reference documents, and the text, wherein the proximity checks analyze how often and how close linguistic characteristics are to one another; ranking the plurality of locations of origin based on the set of proximity scores; ranking the plurality of time periods based on the set of proximity scores; and returning a set of one or more ranked locations of origin of the plurality of locations of origin and a set of one or more ranked time periods of the plurality of time periods.
地址 Armonk NY US