Experiments on Controlling Overfitting in Genetic Programming



One of the most important goals of any Machine Learning approach is to find solutions that perform well not only on the cases used for learning but also on cases never seen before. This is known as generalization ability, and failure to do so is called overfitting. In Genetic Programming this issue has not yet been given the attention it deserves, although the number of publications on this subject has been increasing in the past few years. Here we perform several experiments on a small and yet difficult toy problem specifically designed for this work, where a perfect fitting of the training data inevitably results in poor generalization on the unseen test data. The results show that, on this problem, a Random Sampling Technique with parameter settings that maximize the variation between generations can significantly reduce overfitting when compared to a standard GP approach. We also report the results of some techniques that failed to achieve better generalization.


Genetic Programming, Overfitting, Generalization


Evolutionary Computation, Genetic Programming

Related Project

EnviGP - Improving Genetic Programming for the Environment and Other Applications


15th Portuguese Conference on Artificial Intelligence (EPIA 2011), October 2011

PDF File

Cited by

Year 2015 : 3 citations

 Žegklitz, Jan, and Petr Pošík. "Model Selection and Overfitting in Genetic Programming: Empirical Study." Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference. ACM, 2015.

 Søren Atmakuri Davidsen, E. Sreedevi, and M. Padmavathamma. "Local and Global Genetic Fuzzy Pattern Classifiers." Machine Learning and Data Mining in Pattern Recognition

 Žegklitz, Jan, and Petr Pošík. "Model Selection and Overfitting in Genetic Programming: Empirical Study." arXiv preprint arXiv:1504.08168 (2015).

Year 2014 : 3 citations

 Fitzgerald, Jeannie. "Bias and Variance Reduction Strategies for Improving Generalisation Performance of Genetic Programming on Binary Classification Tasks." PhD diss., University of Limerick, 2014.

 Urbano, Paulo, Enrique Naredo, and Leonardo Trujillo. "Generalization in Maze Navigation Using Grammatical Evolution and Novelty Search." In Theory and Practice of Natural Computing, pp. 35-46. Springer International Publishing, 2014.

 Martínez, Yuliana, Leonardo Trujillo, Enrique Naredo, and Pierrick Legrand. "A Comparison of Fitness-Case Sampling Methods for Symbolic Regression with Genetic Programming." EVOLVE-A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V (2014): 201-212.

Year 2012 : 1 citations

 Martinez-Arellano, Giovanna, Lars Nolle, and John Bland. "Improving WRF-ARW Wind Speed Predictions using Genetic Programming." In Research and Development in Intelligent Systems XXIX, pp. 347-360. Springer London, 2012.