CCEA and DNFEA
The conjunctive clause evolutionary algorithm (CCEA) and the disjunctive normal form evolutionary algorithm (DNFEA) were created to find complex interactions associated with real-world data with nominal and possibly ordinal outputs. The CCEA and DNFEA perform supervised learning to find complex, multivariate correlations with a specific target outcome (e.g., disease). The CCEA is capable of finding feature (epistatic) interactions in datasets that have noise, missing data, and/or multiple data types (i.e., continuous, ordinal, and nominal). The CCEA also has the capability of using a feature sensitivity function to help prevent the archiving of overfit feature interactions. The DNFEA is used after the CCEA to find heterogeneous combinations that may have a stronger correlations with an output category than any single conjunctive clause. Both the CCEA and DNFEA use the hypergeometric probability mass function as a fitness function.
Cite As
Hanley, J.P., Rizzo, D.M., Buzas, J.S., and Eppstein, M.J. "A Tandem Evolutionary Algorithm for Identifying Causal Rules from Complex Data.", Evolutionary Computation, accepted subject to final editorial review, 2019. Abstract: We propose a new evolutionary approach for discovering causal rules in complex classification problems from batch data. Key aspects include (a) the use of a hypergeometric probability mass function as a principled statistic for assessing fitness that quantifies the probability that the observed association between a given clause and target class is due to chance, taking into account the size of the dataset, the amount of missing data, and the distribution of outcome categories, (b) tandem age-layered evolutionary algorithms for evolving parsimonious archives of conjunctive clauses, and disjunctions of these conjunctions, each of which have probabilistically significant associations with outcome classes, and (c) separate archive bins for clauses of different orders, with dynamically-adjusted order-specific thresholds. The method is validated on majority-on and multiplexer benchmark problems exhibiting various combinations of heterogeneity, epistasis, overlap, noise in class associations, missing data, extraneous features, and imbalanced classes. We also validate on a more realistic synthetic genome dataset with heterogeneity, epistasis, extraneous features, and noise. In all synthetic epistatic benchmarks, we consistently recover the true causal rule sets used to generate the data. Finally, we discuss an application to a complex real-world survey dataset designed to inform possible ecohealth interventions for Chagas disease.
MATLAB Release Compatibility
Platform Compatibility
Windows macOS LinuxCategories
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Discover Live Editor
Create scripts with code, output, and formatted text in a single executable document.
Version | Published | Release Notes | |
---|---|---|---|
1.0.5 | Fixed a couple of small bugs in the DNFreducepop function. |
||
1.0.4 | Small bugs were fixed in CCreducepop, FeatureSensitivity, and CCSensitivity functions. |
||
1.0.3 | Uploaded an image picture for MathWorks website. |
||
1.0.2 | A Read_Me text file was added. Also, more information was added to the example problems such as how one could plot the results and how to convert interesting DNFs into a more readable format. |
||
1.0.1 | Updated example problems. |
||
1.0.0 |