Controlling Bloat: Individual and Population Based Approaches in Genetic Programming



Genetic Programming (GP) is the automated learning of computer programs. Basically a search process, it is capable of solving complex problems by evolving populations of computer programs, using Darwinian evolution and Mendelian genetics as inspiration. Theoretically, GP can solve any problem whose candidate solutions can be measured and compared, making it a widely applicable technique. Furthermore, the solutions found by GP are usually provided in a format that users can understand and modify to their needs. But its high versatility is also the cause of some difficulties. The search space of GP is virtually unlimited and programs tend to grow in size during the evolutionary process. Code growth is a healthy result of genetic operators in search of better solutions, but it also permits the appearance of pieces of redundant code that increase the size of programs without improving their fitness. Besides consuming precious time in an already computationally intensive process, redundant code may start growing rapidly, a phenomenon known as bloat. Bloat can be defined as an excess of code growth without a corresponding improvement in fitness. This is a serious problem in GP, often leading to the stagnation of the evolutionary process. Although many bloat control methods have been proposed so far, a definitive solution is yet to be found.

This thesis provides a comprehensive description of all the bloat theories proposed so far, and a detailed taxonomy of available bloat control methods. Then it proposes two novel bloat control approaches, Dynamic Limits and Resource-Limited GP, both implemented on the GPLAB software package, also developed in the context of this thesis. Dynamic Limits restricts the size or depth allowed at the individual level, while Resource-Limited GP imposes restrictions only at the population level, regardless of the particularities of the individuals within. Four different problems were used as a benchmark to study the efficiency of both Dynamic Limits and Resource-Limited GP. They represent a varied selection of problems in terms of bloat dynamics and response to different bloat control techniques: Symbolic Regression of the quartic polynomial, Artificial Ant on the Santa Fe food trail, 5-Bit Even Parity, and 11-Bit Boolean Multiplexer. The results of exhaustive experiments have shown that, although Dynamic Limits was a more efficient bloat control method than Resource-Limited GP across the set of problems studied, both approaches successfully performed the task they were designed to. Without adding any parameters to the search process, it was possible to match the performance of some of the best state-of-the-art methods available so far.


genetic programming, bloat, dynamic limits, limited resources


Genetic Programming

PhD Thesis

Controlling Bloat: Individual and Population Based Approaches in Genetic Programming, April 2008

PDF File

Cited by

Year 2012 : 3 citations

 Dabhi VK, Chaudhary S (2012). A Survey on Techniques of Improving Generalization Ability of Genetic Programming Solutions. arXiv preprint arXiv:1211.1119.

 Allaeys F (2012). Ontwikkeling van evolutionaire compacte robotsturingen en hun morfologie. MSc Thesis. University of Ghent, Belgium.

 Jing X, Qiu-Wang W, Min Z (2012). Improvement of Genetic Programming Symbolic Regression and its Application in Heat Exchangers. Journal of Engineering Thermophysics 33(8): 1415-1418.

Year 2011 : 7 citations

 Fitzgerald J, Ryan C (2011). Validation Sets for Evolutionary Curtailment with Improved Generalisation. In Convergence and Hybrid Information Technology, 282-289.

 Hien NT, Hoai NX, McKay B (2011). A Study on Genetic Programming with Layered Learning and Incremental Sampling. In IEEE Congress on Evolutionary Computation (CEC 2011), 1179 – 1185.

 Okhovat A, Mousavi SM (2011). Modeling of arsenic, chromium and cadmium removal by nanofiltration process using genetic programming. Applied Soft Computing 12(2): 793–799.

 Poli R (2011). Covariant Tarpeian Method for Bloat Control in Genetic Programming. In Genetic Programming Theory and Practice VIII, 71-89.

 Stokic I (2011). Primjena Genetskog Programiranja u Strojnom Ucenju. DIPLOMSKI RAD br. 213. Sveuciliste u Zagrebu, Fakultet Elektrotehnike I Racunarstva, Zagreb, Croatia.

 Trujillo L (2011). Genetic programming with one-point crossover and subtree mutation for effective problem solving and bloat control. Soft Computing - A Fusion of Foundations, Methodologies and Applications 15(8): 1551–1567.

 Xie H, Zhang M (2011). Depth-control strategies for crossover in tree-based genetic programming. Soft Computing - A Fusion of Foundations, Methodologies and Applications 15(9): 1865-1878.

Year 2010 : 1 citations

 Pappa GL, Freitas AA (2010). Automating the design of data mining algorithms. An evolutionary computation approach. Springer.

Year 2009 : 2 citations

 Xie H, Zhang M (2009). An Analysis and Evaluation of the Saving Capability and Feasibility of Backward-Chaining Evolutionary Algorithms. In Artificial Life: Borrowing from Biology, 63-72.

 Beadle LCJ (2009). Semantic and structural analysis of genetic programming. PhD Thesis, University of Kent, UK.