发明名称 PATTERN MATCHING BASED CHARACTER STRING RETRIEVAL
摘要 Embodiments relate to generating a retrieval condition for retrieving a target character string from texts by pattern matching. An aspect includes dividing a first text into words. Another aspect includes generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string. Another aspect includes replacing at least one character of the target character string. Another aspect includes generating the retrieval condition for retrieval candidates in the words of the first text, the retrieval condition comprising determining that a retrieval candidate matches the target character string and does not match the converted character string based on a ratio of a part of the retrieval candidate which matches the converted character string and corresponds to the target character string is less than or equal to a reference frequency.
申请公布号 US2015242537(A1) 申请公布日期 2015.08.27
申请号 US201514629589 申请日期 2015.02.24
申请人 International Business Machines Corporation 发明人 Takeuchi Emiko;Takuma Daisuke;Toyoshima Hirobumi
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for generating a retrieval condition for retrieving a target character string from texts by pattern matching, the method comprising: dividing a first text into words; generating a converted character string by performing at least one of appending at least one character in at least either one of previous and subsequent positions of the target character string; replacing at least one character of the target character string; and generating the retrieval condition for retrieval candidates in the words of the first text, the retrieval condition comprising determining that a retrieval candidate matches the target character string and does not match the converted character string based on a ratio of a part of the retrieval candidate which matches the converted character string and corresponds to the target character string is less than or equal to a reference frequency.
地址 Armonk NY US