Complexity International  
 

 

ISSN 1320-0682


Source:   http://www.complexity.org.au/ci/vol05/gao/grateGao.html   Received: 1/5/1998
Vol 5:   Copyright 1998   Accepted for publication: 1/6/1998

An Upper Bound on the Convergence Rates of Canonical Genetic Algorithms

Yong Gao
Department of Computing Science, University of Alberta,
Edmonton, Alberta, Canada T6G 2H1
Email: ygao@cs.ualberta.ca
 
 

Abstract

An upper bound on the rates of convergence to stationary distributions of canonical genetic algorithms is established in terms of the algorithms' control parameters including the population size, the encoding length, and the mutation probability. The upper bound, in conjunction with results in the literature on the performance of GA, indicates that there is a tradeoff between the convergence rate and the long-term performance in genetic algorithms. To achieve this tradeoff, we propose a general framework of GAs in which structural parameters of GAs, such as the population size and the encoding length, can be modified dynamically.
 
 

1. Introduction

Genetic algorithm(GAs) are robust and efficient search and optimization techniques inspired by Darwin's theory of natural evolution. Since the pioneering work of Holland (1975) and De Jong (1975), GAs have been successfully applied to many practical fields such as function optimization, adaptation control, machine learning, and the training of artificial neural networks and fuzzy systems. Recently, much attention has been paid to the theoretical analysis of GAs in an effort to establish their validity and to present guidance to improve their performances. The theory of Markov chains has been demonstrated to be a very powerful tool for the theoretical analysis of GAs. There are mainly two approaches to modeling GAs as Markov chains. The first approach, called population Markov Chain model, views the sequence of population in GAs as finite Markov chains on population space( Eiben, Aarts, and Hee (1991), Fogel (1994), Rudolph (1994)), Leung, Gao and Xu (1997)), while the second approach models the GAs by identifying the states of the population with probability vectors over the individual space (Reynolds and Gomatam (1996), Vose (1996)).

Using these Markov chain models, the limit states or the steady distributions of GAs have been characterized and conditions for GAs to converge to these limit states have been discussed. Though interesting from a theoretical viewpoint, these results on the limit behavior (or long-term behavior) of GAs are of little use in providing guidance to the design of more efficient GAs since in practical problems GAs will invariably terminate within finite generations due to the limited computational resources. Therefore, from a practical point of view, results concerning the short-term behavior of GAs must be more pleasant and pertinent. This is particularly true if one can explicitly estimate the convergence rates of GAs in terms of the control parameters of GAs. However, to the best of our knowledge, the only result on the explicit estimation of the convergence rates for GAs is that of Suzuki (1995) where the convergence rate for a genetic algorithm with elitist selection was upper bounded in terms of the mutation probability.

In this paper, we study the canonical genetic algorithms(CGAs) with proportional selection, one-point crossover, and bit mutation. We establish an upper bound on the rate of convergence to the stationary distribution of the population Markov chain. Our upper bound is explicitly given by CGA parameters, including the population size, the encoding length, and the mutation probability. The result, in conjunction with the results in the literature on the performance of GA, indicates that there is a tradeoff between the convergence rate and the performance of CGAs. A general framework of GAs that can achieve this tradeoff is proposed.
 
 

2. Canonical Genetic Algorithms and the Population Markov Chains

We consider the GAs with binary string representations of the encoding length l and the fixed population size N. We further assume that the GAs use proportional selection, one-point crossover, and bit mutation. The set of individuals(encoded feasible solutions) is denoted by tex2html_wrap_inline567 and is called the individual space. The set of populations with size N is denoted by tex2html_wrap_inline571. Particularly, we call tex2html_wrap_inline573 the parents space. For convenience, we write the population tex2html_wrap_inline575 in both the vector and matrix forms as follows.
displaymath577
where tex2html_wrap_inline579 is the ith individual of tex2html_wrap_inline583, while tex2html_wrap_inline585 is the jth component of tex2html_wrap_inline589. The fitness function tex2html_wrap_inline591 can be derived from the objective function of the optimization problem by a certain decoding rule.

The canonical genetic algorithm (CGA) considered throughout this paper is given as follows.

CGA

Step 1:
Set k=0 and generate initial population tex2html_wrap_inline595;
Step 2:
Independently select N pairs of individuals from the current population for reproduction;
Step 3:
Independently perform crossover to the N pairs of individuals to generate N new intermediate individuals;
Step 4:
Independently mutate the N intermediate individuals to get the next generation

displaymath605
Step 5:
Stop if some stopping criterion is met. Else, set k=k and go to Step 2.

From the mathematical point of view, the operators are random mappings between the spaces tex2html_wrap_inline571tex2html_wrap_inline609, and S. These operators can be formally defined as follows.

(A) The proportional selection operator, tex2html_wrap_inline613, selects a pair of parents from the given population for reproduction. Given the population tex2html_wrap_inline583, the probability of selecting tex2html_wrap_inline617 as the parents is
equation73

(B) The crossover operator, tex2html_wrap_inline619, generates a new individual from the selected parents. Given the parents tex2html_wrap_inline621 tex2html_wrap_inline623 generates a new individual by first choosing a random position, and then substituting, with the so-called crossover probability tex2html_wrap_inline625, the gene segment after the chosen position in the first individual by the gene segment after the chosen position in the second individual.

(C) The mutation operator, tex2html_wrap_inline627, operates on the individual by independently perturbing each bit string in a probabilistic manner and can be specified as follows:
equation82
where tex2html_wrap_inline629 is the mutation probability.

Based on the genetic operators defined above, the CGA can be represented as the following iteration of populations:
equation86
where tex2html_wrap_inline631, are independent versions of tex2html_wrap_inline633. Since state of the next generation only depends on the current population, and the genetic operators do not change over time, the sequence of populations  tex2html_wrap_inline635 is a time-homogeneous Markov chain with the state space  tex2html_wrap_inline571 (henceforth it is called the population Markov chain). Similar to Rudolph (1994), it is not hard to get the following theorem on the limit behavior of the population Markov chain  tex2html_wrap_inline635.

THEOREM 2.1: Let  tex2html_wrap_inline635 be the population Markov chain of a CGA with nonzero mutation probability, and denote the probability distribution of  tex2html_wrap_inline643 by  tex2html_wrap_inline645 where  tex2html_wrap_inline647. Then there exists a unique probability distribution  tex2html_wrap_inline649 on tex2html_wrap_inline571 such that for any initial population  tex2html_wrap_inline595,
equation100
where tex2html_wrap_inline655 is the total variation distance between tex2html_wrap_inline657 and tex2html_wrap_inline649. Moreover, tex2html_wrap_inline649 is also called the stationary distribution of tex2html_wrap_inline635 in the sense that if tex2html_wrap_inline665 for some tex2html_wrap_inline667, then tex2html_wrap_inline669 for all tex2html_wrap_inline671.

Theorem 2.1 shows that the population of a CGA with fixed nonzero mutation probability converges to equilibrium as the number of the generations tends to infinity. From theorem 2.1, we can also conclude that when a CGA achieves its equilibrium state at a certain generation, nothing will change thereafter from the statistical viewpoints. It is then natural to ask the questions that ``how quickly the populations will converge'' and ``how the CGA parameters affect the rate of the convergence''.

The method of eigenvalue analysis is a popular approach to the study of the convergence rates for Markov chains (Iosifescu (1980) ), and has been used by Suzuki (1995) to bound the convergence rate of a special class of GAs-- GAs with elitist selection. However, eigenvalue analysis is not quite suitable to investigate the population Markov chains of CGAs, since it is very difficult to even get a sensible upper bound on the magnitude of the second largest eigenvalue of the transition probability matrix. In this paper, we investigate the convergence rates for CGAs by a more powerful method, called the method of minorization conditions, which has been widely used in the study of the convergence rates for Markov Chain Monte Carlo algorithms in statistics ( Rosenthal (1995b)).
 
 

3. An Upper Bound on the Convergence Rates of CGAs

Consider the population Markov chain tex2html_wrap_inline673 with the transition probability matrix tex2html_wrap_inline675 defined as
equation112
A minorization condition (Rosenthal 1995a ) for the Markov chain tex2html_wrap_inline673 is a pair tex2html_wrap_inline679 such that
equation124
where tex2html_wrap_inline681 is a positive real number and tex2html_wrap_inline683 is a probability distribution on tex2html_wrap_inline571.

The following result gives an upper bound on the convergence rate for a Markov chain in terms of its minorization condition.

THEOREM 3.1(Rosenthal,1995a,): Suppose tex2html_wrap_inline679 is a minorization condition for a Markov chain tex2html_wrap_inline673 with the stationary distribution tex2html_wrap_inline649. Then given any initial distribution, we have
equation130

In order to use the above result to study the convergence rates for Markov chains that arise from practical problems, one needs to construct specific forms of tex2html_wrap_inline681 based on the nature of the problems so that quantitative upper bounds on the convergence rate can be obtained.

In the following, we will construct a specific minorization condition for the population Markov chain of the CGA. In this way, the convergence rate for the CGA can be upper bounded in terms of GA parameters including the population size, the encoding length, and the mutation probability. To proceed, we first present a lemma on the transition probability of a CGA, which is interesting in itself.

LEMMA 3.1: Consider the CGA with the fixed population size tex2html_wrap_inline695, encoding length l>1, and mutation probability tex2html_wrap_inline699. Let tex2html_wrap_inline673 be the population Markov chain. Then the transition probability matrix tex2html_wrap_inline675 of the CGA population Markov chain satisfies
equation133

PROOF OF LEMMA 3.1: By the definition of the mutation operator, we have, for any tex2html_wrap_inline705,
eqnarray142
where tex2html_wrap_inline707 denotes the Hamming distance between the individuals tex2html_wrap_inline589 and tex2html_wrap_inline711. For any population tex2html_wrap_inline583 and tex2html_wrap_inline715, let tex2html_wrap_inline717 be the population generated by selection and crossover operators from tex2html_wrap_inline583, we have
eqnarray155
Thus, to prove the lemma, we only need to show that there exists at least two populations tex2html_wrap_inline721, such that
equation180
Consider the two homogeneous population tex2html_wrap_inline723 and tex2html_wrap_inline725, where X and Y are two individuals with contrary values at each component, that is, |X-Y|=l. It follows from the definitions of selection and crossover operators that
displaymath733
Therefore, we have
eqnarray191
This proves (10) and the lemma follows.

Using Lemma 3.1 and Theorem 3.1, we can now get our main results, which establishes an upper bound on the convergence rate for the CGA.

THEOREM 3.2: Consider the CGA with the fixed population size tex2html_wrap_inline695, encoding length l>1, and mutation probability tex2html_wrap_inline699. Let tex2html_wrap_inline673 be the population Markov chain, tex2html_wrap_inline657 be the probability distribution of the kth generation tex2html_wrap_inline643, and tex2html_wrap_inline649 be the stationary distribution. Then we have
equation206

PROOF OF THEOREM 3.2: It is obvious that the number of populations in tex2html_wrap_inline571 is tex2html_wrap_inline753. Let tex2html_wrap_inline755, and define a probability measure tex2html_wrap_inline757 on tex2html_wrap_inline571 by
displaymath761
where |A| denotes the cardinality of the set A. By lemma 3.1, it can be verified that, for and tex2html_wrap_inline767,
displaymath769
Thus, tex2html_wrap_inline679 defined above is a minorization conditions for the population Markov chain. The theorem then follows from (7) of theorem 3.1.

From theorem 3.2, we can observe the following facts about the effects of CGA parameters on the convergence rate of CGA: (1) The larger the mutation probability, the faster the CGA converges to equilibrium; and (2) The larger the population size and the encoding length, the slower the CGA converges. These two facts lead us to propose in the next section, a new kind of GAs, called Variable Structure GA.
 
 

4. The Implications of the Upper Bound: A Variable Structure GA

Theorem 3.2 tells us that the larger the mutation probability is and the smaller the population size N and the encoding length l are, the more quickly the CGA converges. On the other hand, it is well recognized that these GA parameters have almost the contrary effects on the long-term performance of the CGA, i.e., the smaller the mutation probability and the larger the population size and the encoding length, the better the performance of the CGA (De Jong, 1975, Leung, Gao, & Xu (1997), Robertson (1988) ; Schaffer, Caruana, Eshelman, & Das (1989); Schraudolph and Belew (1992)). There are also theoretical results stating that when the population size tends to infinity, or when the population size is large enough and the mutation probability tends to zero, the CGA will converge in probability to the global optimum ( Cerf (1996), Peck and Dhawan (1995)). These theoretical and empirical results imply that there is a tradeoff between the convergence rate and the long-term performance of the algorithms.

By the theoretical results presented in this paper, we can propose a general framework for GAs that can achieve this rate-performance tradeoff in a graceful way. The basic idea is to adopt a sequence of GA parameter settings tex2html_wrap_inline777 satisfying
displaymath779
For each parameter setting, the population are iterated in the same way as in the CGA. First, the CGA with the parameter tex2html_wrap_inline781 is run so that the equilibrium is achieved quickly (since tex2html_wrap_inline783 and tex2html_wrap_inline785 are small and tex2html_wrap_inline787 is large). Denote the near equilibrium population by tex2html_wrap_inline789. Then, using a certain scheme to modify the population tex2html_wrap_inline789 into tex2html_wrap_inline793 so that it can be used as the initial population of the GA with the second parameter setting tex2html_wrap_inline795, and run the CGA with the this parameter setting and the initial population tex2html_wrap_inline793 until equilibrium. In this way, the algorithm gradually increases the population sizes and the encoding length and decreases the mutation probability so that its ultimate performance is good. Similar to the observations in the practice of homogeneous simulated annealing (Graffigen, 1992)), we can expect that the above updating scheme of the parameter settings will speed up the convergence because

(1) At the beginning, the algorithm converges fast since N and l are small and tex2html_wrap_inline629 is large;

(2) If the difference between the two successive parameter settings tex2html_wrap_inline805 and tex2html_wrap_inline807 is little, the CGA with the parameter tex2html_wrap_inline809 will also converge quickly since the distribution of its initial population tex2html_wrap_inline811, which is a modification of the equilibrium population of the CGA with the parameter tex2html_wrap_inline805, is very near to its stationary distribution.

An algorithm using the above idea can be called a variable structure GA (VSGA) since, contrary to the CGA, its population structure changes over time. A general framework of VSGA is given as follows.

VSGA


5. Conclusions and Future Work

In this paper, an upper bound on the rate of convergence for the CGA was established in terms of the CGA parameters including the population size, the encoding length, and the mutation probability. The results depict how the GA parameters affect the convergence rate of the algorithms. The results also indicate that there is a tradeoff between the convergence rate and the performance of the CGA. A general framework for GAs that can achieve this tradeoff was proposed.

The upper bound presented in this paper can be improved when more powerful minorization conditions for the population Markov chains are established. With regard to the VSGA, several practical issues remained to be further explored. First, it is at present unclear how to judge if a CGA Markov chain has converged approximately to the corresponding stationary distribution in Step 2 of VSGA. This is crucial to the design of VSGAs for practical problems, and the work on diagnosing convergence in Markov chain Monte Carlo algorithms (Roberts (1992)) may be a promising way. Second, in Step 3 of VSGA, a concrete method needs to be deviced to modify the near equilibrium population tex2html_wrap_inline825 into the initial population tex2html_wrap_inline811 with the parameters tex2html_wrap_inline809. For the modification of the encoding length, the method of dynamic parameter encoding of Schraudolph Belew (1992) may be of help. Third, the updating scheme of the parameter settings needs to be carefully chosen. In the future, we would like to further study these problems from both theoretical and practical perspectives. In particular, we hope to empirically study the problem of convergence rate to verify our theoretical results and to adopt some heuristics to design a concrete version of VSGA for practical optimization problems.
 
 

Acknowledgements

The Author would like to express his appreciation to the anonymous reviewers for their helpful suggestions.
 
 

References

[1] Cerf, R.(1996).
The dynamics of mutation-selection algorithms with large population size. Annales de l'Institut Henri Poincare- Probabilites et Statistiques, 32(4), 455-508.
[2] De Jong, K.A.(1975).
An analysis of the behavior of a class of genetic adaptive systems. Ph.D. dissertation, University of Michigen, Ann Arbor.
[3] Eiben, A.E., Aarts, E.H.L., & Hee, K.M.Van.(1991).
Global convergence of genetic algorithms : A Markov chain analysis. In H.-P.Schwefel & R.Mtex2html_wrap_inline841nner (Eds.), Parallel problem solving from nature(pp.4-12). Berlin and Heideberg: Springer.
[4] Fogel, D.B.(1994).
Asymptotic convergence properties of genetic algorithms and evolutionary programming: Analysis and experiments. Cybernetics and Systems, 25(3), 389-407.
[5] Graffigne, C.(1992).
Parallel annealing by periodically interacting multiple searches: an experimental study. In R.Azencott(Ed.), Simulated annealing: Parallelization techniques, (pp.47-99). New York: John Wiley& Sons.
[6] Holland, J.H.(1975).
Adaptation in natural and artificial systems. Ann Arbor, MI: The University of Michigan Press.
[7] Iosifescu, M.(1980).
Finite Markov processes and their applications. New York: John Wiley& Sons.
[8] Leung, Y., Gao, Y., & Xu, Z.B.(1996).
Degree of population diversity: A perspective on premature convergence in genetic algorithms and its Markov chain analysis. IEEE Transactions on Neural Networks, 8(5), 1165-1176.
[9] Peck, C.C., & Dhawan, A.P.(1995).
Genetic algorithms as global random search methods: An alternative perspective. Evolutionary Computation, 3(1), 39-80.
[10] Reynolds, D., & Gomatam, J.(1996).
Stochastic modelling of genetic algorithms. Artificial Intelligence, 82, 303-330.
[11] Roberts, G.O.(1992).
Convergence diagnostics of Gibbs sampler. In J.M.Bernardo, J.O.Berger, A.P.Dawid, & A.F.M.Smith (Eds.), Bayesian Statistics-4(pp.775-782), UK: Oxford University Press.
[12] Robertson, G.G.(1988).
Population size in classifier systems. In Proceedings of the 5th International Conference on Machine Learning(pp.142-152). CA: Morgan Kaufmann.
[13] Rosenthal, J.S.(1995a).
Convergence rates for Markov chains. SIAM Review, 37(3), 387-405.
[14] Rosenthal, J.S.(1995b).
Minorization conditions and convergence rates for Markov chain Monte Carlo. Journal of the American Statistical Association, 90(430), 558-566.
[15] Rudolph, G.(1994).
Convergence analysis of canonical genetic algorithms. IEEE Transactions on Neural Networks, 5(1), 96-101.
[16]Schaffer, J.D., Caruana, R.A., Eshelman, L.J., & Das, R.(1989).
A study of control parameters affecting online performance of genetic algorithms for function optimization. In J.D.Schaffer(Ed.), Proceedings of the third International Conference on Genetic Algorithms (pp.51-60), San Mateo, CA: Morgan Kaufmann.
[17] Schraudolpl, N.N., & Belew, R.K.(1992).
Dynamic parameter encoding for genetic algorithms. Machine Learning, 9, 9-21.
[18] Suzuki, J.(1995).
A Markov chain analysis on simple genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 25(4), 655-659.
[19] Vose, M.D.(1996).
Modeling simple genetic algorithms. Evolutionary Computation, 3(4), 453-472.


       

EMail Contact:  Complexity International Editor