摘要 |
Methods for Markov boundary discovery are important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Currently there exist two major local method families for identification of Markov boundaries from data: methods that directly implement the definition of the Markov boundary and newer compositional Markov boundary methods that are more sample efficient and thus often more accurate in practical applications. However, in the datasets with hidden (i.e., unmeasured or unobserved) variables compositional Markov boundary methods may miss some Markov boundary members. The present invention circumvents this limitation of the compositional Markov boundary methods and proposes a new method that can discover Markov boundaries from the datasets with hidden variables and do so in a much more sample efficient manner than methods that directly implement the definition of the Markov boundary. In general, the inventive method transforms a dataset with many variables into a minimal reduced dataset where all variables are needed for optimal prediction of some response variable. The power of the invention was empirically demonstrated with data generated by Bayesian networks and with 13 real datasets from a diversity of application domains. |