发明名称 Rule development for natural language processing of text
摘要 In a computing device that defines a rule for natural language processing of text, annotated text is selected from a first document of a plurality of annotated documents. An entity rule type is selected from a plurality of entity rule types. An argument of the selected entity rule type is identified. A value for the identified argument is randomly selected based on the selected annotated text to generate a rule instance. The generated rule instance is applied to remaining documents of the plurality of annotated documents. A rule performance measure is computed based on application of the generated rule instance. The generated rule instance and the computed rule performance measure are stored for application to other documents.
申请公布号 US9460071(B2) 申请公布日期 2016.10.04
申请号 US201514692333 申请日期 2015.04.21
申请人 SAS Institute Inc. 发明人 Avasarala Viswanath;Styles David;Tetterton James;Crowell Richard;Sethi Saratendu
分类号 G06F17/27;G06F17/20;G06F17/24;G06F17/30 主分类号 G06F17/27
代理机构 Bell & Manning, LLC 代理人 Bell & Manning, LLC
主权项 1. A non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a processor of a computing device cause the computing device to: (a) select annotated text from a first document of a plurality of annotated documents; (b) select an entity rule type from a plurality of entity rule types; (c) identify an argument of the selected entity rule type; (d) randomly select a value for the identified argument based on the selected annotated text to generate a rule instance; (e) apply the generated rule instance to remaining documents of the plurality of annotated documents; (f) compute a rule performance measure based on application of the generated rule instance; (g) store the generated rule instance and the computed rule performance measure; (h) repeat (a) to (g) with each remaining document of the plurality of annotated documents as the first document to define a plurality of rules; select a number of rules from the defined plurality of rules based on the stored, computed rule performance measure; and store each rule of the selected number of rules to the non-transitory computer-readable medium as a basis for a rules model that automatically identifies an entity or a relationship in non-annotated text.
地址 Cary NC US