发明名称 Identifying and characterizing an analogy in a document
摘要 Disclosed is a method and system for identifying and characterizing an analogy in a document. In one implementation, the method comprises identifying a candidate document. The candidate document comprises an analogy for a target concept, a region of interest and a linguistic marker included in the region of interest. Further, the method comprises classifying the candidate document as an analogy document or a non-analogy document based upon a size of a region of interest and a count of linguistic marker. Furthermore, the method comprises identifying a source concept from the analogy document. Subsequently, the method comprises characterizing the source concept with corresponding metadata. The metadata comprises a familiarity of the source concept, a length of the source concept, and a readability of the source concept.
申请公布号 US9588965(B2) 申请公布日期 2017.03.07
申请号 US201514672897 申请日期 2015.03.30
申请人 TATA CONSULTANCY SERVICES LIMITED 发明人 Pedanekar Niranjan;Kumar Varun;Bhat Savita Suhas
分类号 G06F17/27;G06F17/28 主分类号 G06F17/27
代理机构 Thompson Hine LLP 代理人 Thompson Hine LLP
主权项 1. A method for identifying and characterizing an analogy in a document, the method comprising: identifying a candidate document, wherein the candidate document comprises an analogy for a target concept, a region of interest, and a linguistic marker included in the region of interest; classifying the candidate document as an analogy document or a non-analogy document based upon a size of the region of interest and a count of the linguistic marker; identifying a source concept from the analogy document, wherein the source concept comprises the analogy; and characterizing the source concept with corresponding metadata, wherein the metadata comprises a familiarity of the source concept, a length of the source concept, and a readability of the source concept, and wherein the familiarity of the source concept is calculated using an extracting Distributional related words using Co-occurrences (DISCO) tool, the length of the source concept is calculated using the size of the region of interest, and the readability of the source concept is calculated using a Flesch-Kincaid readability score method.
地址 Mumbai, Maharashtra IN