发明名称 |
Identifying and characterizing an analogy in a document |
摘要 |
Disclosed is a method and system for identifying and characterizing an analogy in a document. In one implementation, the method comprises identifying a candidate document. The candidate document comprises an analogy for a target concept, a region of interest and a linguistic marker included in the region of interest. Further, the method comprises classifying the candidate document as an analogy document or a non-analogy document based upon a size of a region of interest and a count of linguistic marker. Furthermore, the method comprises identifying a source concept from the analogy document. Subsequently, the method comprises characterizing the source concept with corresponding metadata. The metadata comprises a familiarity of the source concept, a length of the source concept, and a readability of the source concept. |
申请公布号 |
US9588965(B2) |
申请公布日期 |
2017.03.07 |
申请号 |
US201514672897 |
申请日期 |
2015.03.30 |
申请人 |
TATA CONSULTANCY SERVICES LIMITED |
发明人 |
Pedanekar Niranjan;Kumar Varun;Bhat Savita Suhas |
分类号 |
G06F17/27;G06F17/28 |
主分类号 |
G06F17/27 |
代理机构 |
Thompson Hine LLP |
代理人 |
Thompson Hine LLP |
主权项 |
1. A method for identifying and characterizing an analogy in a document, the method comprising:
identifying a candidate document, wherein the candidate document comprises an analogy for a target concept, a region of interest, and a linguistic marker included in the region of interest; classifying the candidate document as an analogy document or a non-analogy document based upon a size of the region of interest and a count of the linguistic marker; identifying a source concept from the analogy document, wherein the source concept comprises the analogy; and characterizing the source concept with corresponding metadata, wherein the metadata comprises a familiarity of the source concept, a length of the source concept, and a readability of the source concept, and wherein the familiarity of the source concept is calculated using an extracting Distributional related words using Co-occurrences (DISCO) tool, the length of the source concept is calculated using the size of the region of interest, and the readability of the source concept is calculated using a Flesch-Kincaid readability score method. |
地址 |
Mumbai, Maharashtra IN |