发明名称 Computer-program products and methods for annotating ambiguous terms of electronic text documents
摘要 Computer-program products and methods for automatically annotating terms, such as ambiguous terms, in an electronic text document are disclosed. In one embodiment, a method of annotating a text document includes determining, by a computing device, a term of interest within the text document. The method further includes searching a data structure including incongruous term pairs (tx, tt) determined from a controlled vocabulary for the term of interest appearing as a term tt, wherein the term tt is a linguistic head of a term tx of the incongruous term pairs (tx, tt). The method further includes annotating the term of interest with a meaning provided by the controlled vocabulary only if a term tx of the incongruous term pairs (tx, tt) associated with the term of interest in the data structure is not present within a predetermined textual distance of the term of interest in the text document.
申请公布号 US9460091(B2) 申请公布日期 2016.10.04
申请号 US201314080156 申请日期 2013.11.14
申请人 Elsevier B.V. 发明人 Doornenbal Marius;Kohlhof Inga
分类号 G06F17/00;G06F17/30;G06F12/04;G06F17/24;G06F17/27;G06F12/08;G06F17/28 主分类号 G06F17/00
代理机构 Dinsmore & Shohl LLP 代理人 Dinsmore & Shohl LLP
主权项 1. A method of annotating a text document, the method comprising: determining, by a computing device, a term of interest within the text document; searching a data structure storing incongruous term pairs (tx, tt) determined from a controlled vocabulary for the term of interest appearing as a term tt, wherein the term tt is a linguistic head of a term tx of the incongruous term pairs (tx, tt) and term tx is a linguistic derivative of term tt, wherein terms tx, tt have a hierarchical relationship corresponding to the controlled vocabulary;determining a plurality of compound noun phrases within the controlled vocabulary, wherein each compound noun phrase includes terms tx and tt;determining a semantic distance between the second term and the first term, and for each compound noun phrase wherein the semantic distance between tx, and tt is greater than a predetermined threshold distance, saving the compound noun phrase in a data structure as an incongruous term pair (tx, tt), wherein the incongruous term pair has a linguistic discrepancy and a semantic discrepancy, wherein the second term is term tt and the first term is t; andannotating, by the computing device, the term of interest with a meaning provided by the controlled vocabulary only when each term tx of the incongruous term pairs (tx, tt) including the term of interest as term tt in the data structure is not present within a predetermined textual distance of the term of interest in the text document.
地址 Amsterdam NL