摘要 |
An apparatus and a method for executing an automated analysis of analysis input data (e.g. social media data and/or On-Board-Diagnosis data) for product failure detection is proposed. Data analysis processing is performed, including: word count processing to determine word count numbers indicative of occurrence frequencies for keywords of a database in user-created text documents of the social media data; correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient; correlation-link identification processing to identify correlation-linked keyword pairs for which the determined correlation coefficient exceeds a correlation threshold; and correlation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs; and, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords is output. |
主权项 |
1. A method for an automated data analysis, comprising:
providing one or more databases indicative of a plurality of keywords; providing analysis input data obtained from one or more data sources, and pre-processing the analysis input data to generate pre-processed analysis input data available for data analysis processing, the analysis input data including a plurality of text documents respectively being associated with at least one of a plurality of data samples; performing data analysis processing of the pre-processed analysis input data, including:
word count processing to determine word count numbers indicative of occurrence frequencies for keywords of the one or more databases in the text documents of the pre-processed analysis input data for each of the plurality of data samples,correlation determination processing to determine, for each of a plurality of keyword pairs, a respective correlation coefficient being associated with the respective keyword pair, the respective correlation coefficient being indicative of a quantitative measure of correlation between the determined word count numbers of the keywords of the respective keyword pair for the plurality of data samples,correlation-link identification processing to identify correlation-linked keyword pairs, wherein keywords of a keyword pair are determined to be correlation-linked to each other based on a correlation criteria, the correlation criteria including a criteria whether the determined correlation coefficient associated with the respective keyword pair exceeds a correlation threshold, andcorrelation group identification processing to identify correlation groups of keywords based on the identified correlation-linked keyword pairs, each correlation group including keywords of at least one correlation-linked keyword pair and, for each keyword included in the respective correlation group, the respective correlation group further includes the other keywords identified to be correlation-linked to the respective keyword; and outputting, if one or more correlation groups of keywords are identified, analysis result data indicative of at least one of the one or more identified correlation groups of keywords. |