发明名称 |
AUTOMATED TERM EXTRACTION |
摘要 |
A device may obtain a document. The device may identify a skip value for the document. The skip value may relate to a quantity of words or a quantity of characters that are to be skipped in an n-gram. The device may determine one or more skip n-grams using the skip value for the document. A skip n-gram, of the one or more skip n-grams, may include a sequence of one or more words or one or more characters with a set of occurrences in the document. The sequence of one or more words or one or more characters may include a skip value quantity of words or characters within the sequence. The device may extract one or more terms from the document based on the one or more skip n-grams. The device may provide information identifying the one or more terms. |
申请公布号 |
US2017060842(A1) |
申请公布日期 |
2017.03.02 |
申请号 |
US201615247341 |
申请日期 |
2016.08.25 |
申请人 |
Accenture Global Services Limited |
发明人 |
DWARAKANATH Anurag;Priyadarshi Aditya;Anand Bhanu;Tummalapalli Bindu Madhav;Jayaraman Bargav;Ramachandra Nisha;Chandran Anitha;Raghavan Parvathy Vijay;Chaudhari Shalini;Dubash Neville;Podder Sanjay |
分类号 |
G06F17/27 |
主分类号 |
G06F17/27 |
代理机构 |
|
代理人 |
|
主权项 |
1. A device, comprising:
one or more processors to:
obtain a document,
the document including a set of words or a set of characters;identify a skip value for the document,
the skip value relating to a quantity of words or a quantity of characters that are to be skipped in an n-gram;determine one or more skip n-grams using the skip value for the document,
a skip n-gram, of the one or more skip n-grams, including a sequence of one or more words or one or more characters with a plurality of occurrences in the document,
the sequence of one or more words or one or more charactersincluding a skip value quantity of words or characters within the sequence;extract one or more terms from the document based on the one or more skip n-grams,
a term associated with the skip n-gram corresponding to the skip value quantity of words or characters within the sequence; andprovide information identifying the one or more terms. |
地址 |
Dublin IE |