Random Sampling Technique for Overfitting Control in Genetic Programming



One of the areas of Genetic Programming (GP) that, in comparison to other Machine Learning methods, has seen fewer research efforts is that of generalization. Generalization is the ability of a solution to perform well on unseen cases. It is one of the most important goals of any Machine Learning method, although in GP only recently has this issue started to receive more attention. In this work we perform a comparative analysis of a particularly interesting configuration of the Random Sampling Technique (RST) against the Standard GP approach. Experiments are conducted on three multidimensional symbolic regression real world datasets, the first two on the pharmacokinetics domain and the third one on the forestry domain. The results show that the RST decreases overfitting on all datasets. This technique also improves testing fitness on two of the three datasets. Furthermore, it does so while producing considerably smaller and less complex solutions. We discuss the possible reasons for the good performance of the RST, as well as its possible limitations.


Genetic Programming, Overfitting, Generalization


Evolutionary Computation, Genetic Programming

Related Project

EnviGP - Improving Genetic Programming for the Environment and Other Applications


15th European Conference on Genetic Programming (EuroGP 2012), April 2012

PDF File


Cited by

Year 2015 : 5 citations

 Žegklitz, Jan, and Petr Pošík. "Model Selection and Overfitting in Genetic Programming: Empirical Study." Proceedings of the Companion Publication of the 2015 on Genetic and Evolutionary Computation Conference. ACM, 2015.

 Garg, Akhil, and Kang Tai. "Evolving genetic programming models of higher generalization ability in modelling of turning process." Engineering Computations 32, no. 8 (2015)

 Kommenda, Michael, Michael Affenzeller, Gabriel Kronberger, Bogdan Burlacu, and Stephan Winkler. "Multi-Population Genetic Programming with Data Migration for Symbolic Regression." In Computational Intelligence and Efficiency in Engineering Systems, pp. 75-87. Springer International Publishing, 2015.

 Villar, José R., A. Enrique, Javier Sedano, and Marco A. García Tamargo. "Simple heuristics for enhancing GP learning." Logic Journal of IGPL (2015): jzv003.

 Žegklitz, Jan, and Petr Pošík. "Model Selection and Overfitting in Genetic Programming: Empirical Study." arXiv preprint arXiv:1504.08168 (2015).

Year 2014 : 7 citations

 Garg, Akhil, Ankit Garg, K. Tai, and S. Sreedeep. "An integrated SRM-multi-gene genetic programming approach for prediction of factor of safety of 3-D soil nailed slopes." Engineering Applications of Artificial Intelligence (2014).

 Fitzgerald, Jeannie. "Bias and Variance Reduction Strategies for Improving Generalisation Performance of Genetic Programming on Binary Classification Tasks." PhD diss., University of Limerick, 2014.

 Garg, A., K. Tai, and M. M. Savalani. "Formulation of bead width model of an SLM prototype using modified multi-gene genetic programming approach." The International Journal of Advanced Manufacturing Technology (2014): 1-14.

 Haeri, Maryam Amir, Mohammad Mehdi Ebadzadeh, and Gianluigi Folino. "Improving GP generalization: a variance-based layered learning approach." Genetic Programming and Evolvable Machines: 1-29.

 Garg, Ankit, Akhil Garg, K. Tai, S. Barontini, and A. Stokes. "A Computational Intelligence-Based Genetic Programming Approach for the Simulation of Soil Water Retention Curves." Transport in Porous Media (2014): 1-17.

 Garg, Akhil, and Kang Tai. "An Improved Multi-Gene Genetic Programming Approach for the Evolution of Generalized Model in Modelling of Rapid Prototyping Process." In Modern Advances in Applied Intelligence, pp. 218-226. Springer International Publishing, 2014.

 Kommenda, Michael, Michael Affenzeller, Bogdan Burlacu, Gabriel Kronberger, and Stephan M. Winkler. "Genetic programming with data migration for symbolic regression." In Proceedings of the 2014 conference companion on Genetic and evolutionary computation companion, pp. 1361-1366. ACM, 2014.

Year 2013 : 5 citations

 Fitzgerald, Jeannie, R. Azad, and Conor Ryan. "A bootstrapping approach to reduce over-fitting in genetic programming." In Proceeding of the fifteenth annual conference companion on Genetic and evolutionary computation conference companion, pp. 1113-1120. ACM, 2013.

 Park, Namyong, Kangil Kim, and R. I. McKay. "Cutting evaluation costs: An investigation into early termination in genetic programming." In Evolutionary Computation (CEC), 2013 IEEE Congress on, pp. 3291-3298. IEEE, 2013.

 Kommenda, Michael, Gabriel Kronberger, Stephan Winkler, Michael Affenzeller, and Stefan Wagner. "Effects of constant optimization by nonlinear least squares minimization in symbolic regression." In Proceeding of the fifteenth annual conference companion on Genetic and evolutionary computation conference companion, pp. 1121-1128. ACM, 2013.

 Goldstein, E. B., G. Coco, A. B. Murray, and M. O. Green. "Data driven components in a model of inner shelf sorted bedforms: a new hybrid model." Earth Surface Dynamics Discussions 1, no. 1 (2013): 531-569.

 Goldstein, Evan B., Giovanni Coco, and A. Brad Murray. "Prediction of wave ripple characteristics using genetic programming." Continental Shelf Research (2013).