发明名称 ADVANCED FIELD EXTRACTOR WITH MULTIPLE POSITIVE EXAMPLES
摘要 The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.
申请公布号 US2015149879(A1) 申请公布日期 2015.05.28
申请号 US201514610668 申请日期 2015.01.30
申请人 Splunk Inc. 发明人 Miller Jesse;Delfino Micah James;Robichaud Marc;Hanson Catherine Anne;Carasso David
分类号 G06F17/24 主分类号 G06F17/24
代理机构 代理人
主权项 1. A computer-implemented method comprising: accessing in memory a set of events each event identified by an associated time stamp; wherein each event in the set of events includes a portion of raw data; causing display of a first user interface including a plurality of events; receiving data indicating selection of a first event from among the plurality of events; causing display of a second user interface presenting the first event to be used to define field extraction; receiving data indicating a selection of one or more portions of text within the first event to be extracted as one or more fields; automatically determining at least one field extraction rule that extracts one or more values for the one or more fields from the respective selections of the portions of text within the events when the extraction rule is applied to the events; causing display of a third user interface including an annotated version of the plurality of events, wherein the annotated version indicates the portions of text within the plurality of events extracted by the field extraction rule and presenting a second event to be used to refine field extraction; and receiving further data indicating a selection of at least one portion of text within the second event to be extracted as into at least one of the fields by at least one updated field extraction rule.
地址 San Francisco CA US