![]() |
|
|
ISSN 1320-0682 |
| Source: | http://www.complexity.org.au/ci/vol05/gao/grateGao.html | Received: | 1/5/1998 | ||
| Vol 5: | Copyright 1998 | Accepted for publication: | 1/6/1998 |
Using these Markov chain models, the limit states or the steady distributions of GAs have been characterized and conditions for GAs to converge to these limit states have been discussed. Though interesting from a theoretical viewpoint, these results on the limit behavior (or long-term behavior) of GAs are of little use in providing guidance to the design of more efficient GAs since in practical problems GAs will invariably terminate within finite generations due to the limited computational resources. Therefore, from a practical point of view, results concerning the short-term behavior of GAs must be more pleasant and pertinent. This is particularly true if one can explicitly estimate the convergence rates of GAs in terms of the control parameters of GAs. However, to the best of our knowledge, the only result on the explicit estimation of the convergence rates for GAs is that of Suzuki (1995) where the convergence rate for a genetic algorithm with elitist selection was upper bounded in terms of the mutation probability.
In this paper, we study the canonical genetic algorithms(CGAs) with
proportional selection, one-point crossover, and bit mutation. We establish
an upper bound on the rate of convergence to the stationary distribution
of the population Markov chain. Our upper bound is explicitly given by
CGA parameters, including the population size, the encoding length, and
the mutation probability. The result, in conjunction with the results in
the literature on the performance of GA, indicates that there is a tradeoff
between the convergence rate and the performance of CGAs. A general framework
of GAs that can achieve this tradeoff is proposed.
We consider the GAs with binary string representations of the encoding
length l and the fixed population size N. We further assume
that the GAs use proportional selection, one-point crossover, and bit mutation.
The set of individuals(encoded feasible solutions) is denoted by
and is called the individual space. The set of populations with
size N is denoted by
.
Particularly, we call
the parents space. For convenience, we write the population
in both the vector and matrix forms as follows.
where
is the ith individual of
,
while
is the jth component of
.
The fitness function
can be derived from the objective function of the optimization problem
by a certain decoding rule.
The canonical genetic algorithm (CGA) considered throughout this paper is given as follows.
CGA
From the mathematical point of view, the operators are random mappings
between the spaces
,
,
and S. These operators can be formally defined as follows.
(A) The proportional selection operator,
,
selects a pair of parents from the given population for reproduction. Given
the population
,
the probability of selecting
as the parents is
(B) The crossover operator,
,
generates a new individual from the selected parents. Given the parents
generates a new individual by first choosing a random position, and then
substituting, with the so-called crossover probability
,
the gene segment after the chosen position in the first individual by the
gene segment after the chosen position in the second individual.
(C) The mutation operator,
,
operates on the individual by independently perturbing each bit string
in a probabilistic manner and can be specified as follows:
where
is the mutation probability.
Based on the genetic operators defined above, the CGA can be represented
as the following iteration of populations:
where
,
are independent versions of
.
Since state of the next generation only depends on the current population, and
the genetic operators do not change over time, the sequence of populations
is a time-homogeneous Markov chain with the state space
(henceforth it is called the population Markov chain). Similar to
Rudolph (1994), it is not hard to get the following
theorem on the limit behavior of the population Markov chain
.
THEOREM 2.1: Let
be the population Markov chain of a CGA with nonzero mutation probability,
and denote the probability distribution of
by
where
.
Then there exists a unique probability distribution
on
such that for any initial population
,
where
is the total variation distance between
and
.
Moreover,
is also called the stationary distribution of
in the sense that if
for some
,
then
for all
.
Theorem 2.1 shows that the population of a CGA with fixed nonzero mutation probability converges to equilibrium as the number of the generations tends to infinity. From theorem 2.1, we can also conclude that when a CGA achieves its equilibrium state at a certain generation, nothing will change thereafter from the statistical viewpoints. It is then natural to ask the questions that ``how quickly the populations will converge'' and ``how the CGA parameters affect the rate of the convergence''.
The method of eigenvalue analysis is a popular approach to the study
of the convergence rates for Markov chains (Iosifescu
(1980) ), and has been used by Suzuki (1995)
to bound the convergence rate of a special class of GAs-- GAs with elitist
selection. However, eigenvalue analysis is not quite suitable to investigate
the population Markov chains of CGAs, since it is very difficult to even
get a sensible upper bound on the magnitude of the second largest eigenvalue
of the transition probability matrix. In this paper, we investigate the
convergence rates for CGAs by a more powerful method, called the method
of minorization conditions, which has been widely used in the study of
the convergence rates for Markov Chain Monte Carlo algorithms in statistics
( Rosenthal (1995b)).
The following result gives an upper bound on the convergence rate for a Markov chain in terms of its minorization condition.
THEOREM 3.1(Rosenthal,1995a,): Suppose
is a minorization condition for a Markov chain
with the stationary distribution
.
Then given any initial distribution, we have
In order to use the above result to study the convergence rates for
Markov chains that arise from practical problems, one needs to construct
specific forms of
based on the nature of the problems so that quantitative upper bounds on
the convergence rate can be obtained.
In the following, we will construct a specific minorization condition for the population Markov chain of the CGA. In this way, the convergence rate for the CGA can be upper bounded in terms of GA parameters including the population size, the encoding length, and the mutation probability. To proceed, we first present a lemma on the transition probability of a CGA, which is interesting in itself.
LEMMA 3.1: Consider the CGA with the fixed population size
,
encoding length l>1, and mutation probability
.
Let
be the population Markov chain. Then the transition probability matrix
of the CGA population Markov chain satisfies
PROOF OF LEMMA 3.1: By the definition of the mutation operator, we have,
for any
,
where
denotes the Hamming distance between the individuals
and
.
For any population
and
,
let
be the population generated by selection and crossover operators from
,
we have
Thus, to prove the lemma, we only need to show that there exists at
least two populations
,
such that
Consider the two homogeneous population
and
,
where X and Y are two individuals with contrary values at
each component, that is, |X-Y|=l. It follows from
the definitions of selection and crossover operators that
Therefore, we have
This proves (10) and the lemma follows.
Using Lemma 3.1 and Theorem 3.1, we can now get our main results, which establishes an upper bound on the convergence rate for the CGA.
THEOREM 3.2: Consider the CGA with the fixed population size
,
encoding length l>1, and mutation probability
.
Let
be the population Markov chain,
be the probability distribution of the kth generation
,
and
be the stationary distribution. Then we have
PROOF OF THEOREM 3.2: It is obvious that the number of populations in
is
.
Let
,
and define a probability measure
on
by
where |A| denotes the cardinality of the set A. By lemma
3.1, it can be verified that, for and
,
Thus,
defined above is a minorization conditions for the population Markov chain.
The theorem then follows from (7) of theorem 3.1.
From theorem 3.2, we can observe the following facts about the effects
of CGA parameters on the convergence rate of CGA: (1) The larger the mutation
probability, the faster the CGA converges to equilibrium; and (2) The larger
the population size and the encoding length, the slower the CGA converges.
These two facts lead us to propose in the next section, a new kind of GAs,
called Variable Structure GA.
Theorem 3.2 tells us that the larger the mutation probability is and the smaller the population size N and the encoding length l are, the more quickly the CGA converges. On the other hand, it is well recognized that these GA parameters have almost the contrary effects on the long-term performance of the CGA, i.e., the smaller the mutation probability and the larger the population size and the encoding length, the better the performance of the CGA (De Jong, 1975, Leung, Gao, & Xu (1997), Robertson (1988) ; Schaffer, Caruana, Eshelman, & Das (1989); Schraudolph and Belew (1992)). There are also theoretical results stating that when the population size tends to infinity, or when the population size is large enough and the mutation probability tends to zero, the CGA will converge in probability to the global optimum ( Cerf (1996), Peck and Dhawan (1995)). These theoretical and empirical results imply that there is a tradeoff between the convergence rate and the long-term performance of the algorithms.
By the theoretical results presented in this paper, we can propose a
general framework for GAs that can achieve this rate-performance tradeoff
in a graceful way. The basic idea is to adopt a sequence of GA parameter
settings
satisfying
For each parameter setting, the population are iterated in the same
way as in the CGA. First, the CGA with the parameter
is run so that the equilibrium is achieved quickly (since
and
are small and
is large). Denote the near equilibrium population by
.
Then, using a certain scheme to modify the population
into
so that it can be used as the initial population of the GA with the second
parameter setting
,
and run the CGA with the this parameter setting and the initial population
until equilibrium. In this way, the algorithm gradually increases the population
sizes and the encoding length and decreases the mutation probability so
that its ultimate performance is good. Similar to the observations in the
practice of homogeneous simulated annealing (Graffigen,
1992)), we can expect that the above updating scheme of the parameter
settings will speed up the convergence because
(1) At the beginning, the algorithm converges fast since N and
l are small and
is large;
(2) If the difference between the two successive parameter settings
and
is little, the CGA with the parameter
will also converge quickly since the distribution of its initial population
,
which is a modification of the equilibrium population of the CGA with the
parameter
,
is very near to its stationary distribution.
An algorithm using the above idea can be called a variable structure GA (VSGA) since, contrary to the CGA, its population structure changes over time. A general framework of VSGA is given as follows.
VSGA
In this paper, an upper bound on the rate of convergence for the CGA was established in terms of the CGA parameters including the population size, the encoding length, and the mutation probability. The results depict how the GA parameters affect the convergence rate of the algorithms. The results also indicate that there is a tradeoff between the convergence rate and the performance of the CGA. A general framework for GAs that can achieve this tradeoff was proposed.
The upper bound presented in this paper can be improved when more powerful
minorization conditions for the population Markov chains are established.
With regard to the VSGA, several practical issues remained to be further
explored. First, it is at present unclear how to judge if a CGA Markov
chain has converged approximately to the corresponding stationary distribution
in Step 2 of VSGA. This is crucial to the design of VSGAs for practical
problems, and the work on diagnosing convergence in Markov chain Monte
Carlo algorithms (Roberts (1992)) may be a promising
way. Second, in Step 3 of VSGA, a concrete method needs to be deviced to
modify the near equilibrium population
into the initial population
with the parameters
.
For the modification of the encoding length, the method of dynamic parameter
encoding of Schraudolph Belew (1992) may be of help.
Third, the updating scheme of the parameter settings needs to be carefully
chosen. In the future, we would like to further study these problems from
both theoretical and practical perspectives. In particular, we hope to
empirically study the problem of convergence rate to verify our theoretical
results and to adopt some heuristics to design a concrete version of VSGA
for practical optimization problems.
The Author would like to express his appreciation to the anonymous
reviewers for their helpful suggestions.