发明名称 APPARATUS AND METHOD FOR EXECUTING AN AUTOMATED ANALYSIS OF DATA, IN PARTICULAR SOCIAL MEDIA DATA, FOR PRODUCT FAILURE DETECTION
摘要 An apparatus and a method for executing an automated analysis of analysis input data (e.g. social media data and/or On-Board-Diagnosis data) for product failure detection is proposed. Data analysis processing is performed, including: word count processing to determine word count numbers indicative of occurrence frequencies for keywords of a database in user-created text documents of the social media data; correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient; correlation-link identification processing to identify correlation-linked keyword pairs for which the determined correlation coefficient exceeds a correlation threshold; and correlation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs; and, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords is output.
申请公布号 US2017091289(A1) 申请公布日期 2017.03.30
申请号 US201615265971 申请日期 2016.09.15
申请人 Hitachi, Ltd. 发明人 OHAZULIKE Anthony Emeka;TOMATIS Andrea;LIN Lan
分类号 G06F17/30;G06F7/20;G06F17/27 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for an automated data analysis, comprising: providing one or more databases indicative of a plurality of keywords; providing analysis input data obtained from one or more data sources, and pre-processing the analysis input data to generate pre-processed analysis input data available for data analysis processing, the analysis input data including a plurality of text documents respectively being associated with at least one of a plurality of data samples; performing data analysis processing of the pre-processed analysis input data, including: word count processing to determine word count numbers indicative of occurrence frequencies for keywords of the one or more databases in the text documents of the pre-processed analysis input data for each of the plurality of data samples,correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient being associated with the respective keyword pair, the respective correlation coefficient being indicative of a quantitative measure of correlation between the determined word count numbers of the keywords of the respective keyword pair for the plurality of data samples,correlation-link identification processing to identify correlation-linked keyword pairs, wherein keywords of a keyword pair are determined to be correlation-linked to each other based on a correlation criteria, the correlation criteria including a criteria whether the determined correlation coefficient associated with the respective keyword pair exceeds a correlation threshold, andcorrelation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs, each correlation group including keywords of at least one correlation-linked keyword pair and, for each keyword included in the respective correlation group, the respective correlation group further includes the other keywords identified to be correlation-linked to the respective keyword; and outputting, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords.
地址 Tokyo JP