发明名称 |
Rule development for natural language processing of text |
摘要 |
In a computing device that defines a rule for natural language processing of text, annotated text is selected from a first document of a plurality of annotated documents. An entity rule type is selected from a plurality of entity rule types. An argument of the selected entity rule type is identified. A value for the identified argument is randomly selected based on the selected annotated text to generate a rule instance. The generated rule instance is applied to remaining documents of the plurality of annotated documents. A rule performance measure is computed based on application of the generated rule instance. The generated rule instance and the computed rule performance measure are stored for application to other documents. |
申请公布号 |
US9460071(B2) |
申请公布日期 |
2016.10.04 |
申请号 |
US201514692333 |
申请日期 |
2015.04.21 |
申请人 |
SAS Institute Inc. |
发明人 |
Avasarala Viswanath;Styles David;Tetterton James;Crowell Richard;Sethi Saratendu |
分类号 |
G06F17/27;G06F17/20;G06F17/24;G06F17/30 |
主分类号 |
G06F17/27 |
代理机构 |
Bell & Manning, LLC |
代理人 |
Bell & Manning, LLC |
主权项 |
1. A non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a processor of a computing device cause the computing device to:
(a) select annotated text from a first document of a plurality of annotated documents; (b) select an entity rule type from a plurality of entity rule types; (c) identify an argument of the selected entity rule type; (d) randomly select a value for the identified argument based on the selected annotated text to generate a rule instance; (e) apply the generated rule instance to remaining documents of the plurality of annotated documents; (f) compute a rule performance measure based on application of the generated rule instance; (g) store the generated rule instance and the computed rule performance measure; (h) repeat (a) to (g) with each remaining document of the plurality of annotated documents as the first document to define a plurality of rules; select a number of rules from the defined plurality of rules based on the stored, computed rule performance measure; and store each rule of the selected number of rules to the non-transitory computer-readable medium as a basis for a rules model that automatically identifies an entity or a relationship in non-annotated text. |
地址 |
Cary NC US |