|
/vol01/blanct01/ | © Copyright 1994 | |||
| Volume 1 | Received: Accepted: |
00/00/1994 00/00/1994 |
|||
Jacques Blanc-Talon
C.S.I.R.O. Division of Information Technology
GPO Box 664, Canberra A.C.T. 2601, Australia
Email: blanc@csis.dit.csiro.au, blanc@etca.fr
DOL-systems are too restrictive to describe actual fractal curves. A transition from DOL-systems to Context-Free Grammars is proposed. Regular Languages are shown not to be powerful enough to generate fractals.
The problem of inferring a Context-Free Grammar containing DOL-like and free recursive structures is discussed.
In the past, grammars have proven to be an efficient tool both for generating and recognising patterns simultaneously, and since fractals are just particular sets it seems natural to extend these techniques to this new application domain. Given a fractal curve, the first step is to encode it as a sequence of symbols under a number of conditions that depend on the desired accuracy of the coding. The size of this sequence (or word) increases as the accuracy of the encoding increases and, according to a continuity principle, it is legitimate to think of the fractal curve as the image of the limit of the sequence of words [1]. This limit can be mathematically described as the adherence of some language [2]. The second step consists of describing this language of words by using a grammar, which makes sense only if the initial curve has some self-similarity properties (a grammar can generate only recursively enumerable sets of sentences); this condition will be assumed in the following.
A lot of studies have been performed on the generation of self-similar sets by using simple rewriting systems (DOL- or L-systems [3, 4]), and one can also find some work on grammar-based systems. Generally, the former show a strong theoretical framework, which is not the case in the latter. The grammatical methods used have two main drawbacks: either they do not take the fractal nature of the sets into account, processing them as ordinary sets, or they use attribute grammars of a new type, in which case it is not possible to determine the topological properties of the generated patterns. As a consequence there can be uncertainty about the language of patterns that can be recognised.
Our goal is to set up the basic framework for grammatical pattern recognition of fractal curves.
In the next section, we recall the main results about DOL-systems, their topological adherence, and the space
;
in the following section on Fractals and Languages, we generalise these results to any sequence of
words, and then to languages; and in the section on Fractals and Grammars,
we propose a transition from DOL-systems to Context-free grammars. In the last section we describe a grammatical inference process
for 2D fractal curves that uses quad-tree segmentation.
We outline now the major definitions about fractals and languages we use hereafter.
Let a
-cover
of a given set A be any collection of subsets of
the union of
which is A and such that the diameter of every
is less than or equal to
.
For a given subset A, let us define
over all (countable)
-covers
,
then the Hausdorff s-dimensional outer-measure of A is defined as:
.
In particular, the Hausdorff dimension D(A) is the particular value D of s which makes its outer-measure finite and non-null
(see [5, 6] for a more rigorous approach).
Practical computations of the dimension can be achieved by several methods, depending on the problem.
One of them is the box-counting method, which consists in covering A with a regular grid, so as to replace
with
the number of square boxes intersecting with A. More formally,
given
, we write
for the smallest
covering of A with closed balls of radius
;
is assumed to be a geometric sequence
. D(A) is then proved [7] to be equal to:
Any language is a subset of
. Concatenation is naturally generalised from words to languages.
The first example of a fractal curve generated by a DOL-system in
is the following.
Let us consider:
,
being defined as:
The sequence
converges to the infinite word:
Let us define a "Geometrical Mapping" (GM)
whose initial values are set to
;
in the following, we shall call any mapping f such as
a GM, without further discussion.
In order to draw the curve, we use a map K:
such as:
i)
, ii)
,
and iii)
(notations from [3]).
Then the mapping of
is the Jordan curve
, where L is a normalisation factor - that is, a diagonal endomorphism.
In our case, the number of occurrences of each letter in
is given by application of the Toeplitz matrix
where
and each
is a circular shift of
; the eigenvalues of this matrix are
,
and, after computation,
.
We can see that
is the triadic von Koch curve by construction
(with dilation ratios equal to
);
we shall call the 8-symbols von Koch sequence w. Our first result holds in:
Proof: By using the projection
defined as:
if i is odd,
if i is even, one can define
as:
for i = 0, 1 and
for the other indices. Since
,
one can conclude that
.
The Thue-Morse sequence is a projection of the von Koch sequence; the converse makes one think of a possible generalisation of these sequences:

The limit of the related curves (
) is made of arcs of circle - that is, is not fractal;
several cases are shown in Figure 1.
The infinite words previously mentioned have been expressed as the adherence values of some DOL-languages;
we focus now our attention on this notion, an infinite word being essentially a mapping
.
The set of infinite (respectively, possibly infinite) words on X is written
(respectively,
).
We say that u is a finite left-factor of x if there exists v such that x = u v;
clearly, the set LF(w) of all finite left-factors of a word w is
.
This notion is extended to languages and leads to the notion of the adherence
of a language
and to the centre
of a language which is the left-factor set of its adherence:
In other words,
is the set of all infinite strings which are the limits of some
infinite increasing sequence of
.
Therefore, one can see
as the union of these sequences as well as, considering any sequence separately,
the prefixes possibly "missing" between two consecutive values;
moreover, the passage from
to
deletes every isolated word.
Obviously,
and the adherence of a finite set is empty.
In [10], Berstel proved that if an iterated morphism generates an infinite word,
then this word is the only element of the adherence of the DOL-language, and conjectured that
the centre of this language is co-CFL (proved in [11]).
The condition of existence of the adherence is:
The most important result is that, with the previous topology of left-factor sets,
is metrisable:
the following metric over
induces a topology which is the same as the one defined by the closure of
.
Given
, let us define the similarity distance
of two words
as:
Then
is an ultrametric space and since in such a space every converging sequence
has only one limit, we have rediscovered the result of [10].
Let x be a string, and
Then the Levenstein distance between the strings
The extension of this definition to languages is done as usual.
It is quite obvious that
The mapping of any sequence
Proof: Since
According to the definition of K:
(same thing for
We have seen that
Sketch of the proof: Let us consider
Due to the equal length of the strings, the sequence
Let us suppose now that
Given
Proof: We use one by one the regularity properties of the centre, of the neighbourhoods of the centre (each basic operation used in the
Levenstein distance modifies the related automaton, without changing its finiteness), and of a finite union of regular languages;
same proof for context-free languages.
The two previous propositions lead to:
Conversely, we shall say that a language
Proof: Let us consider
However, considering fractals as adherence values have their own limitation, mainly because the concatenation is meaningless in
Given two increasing sequences
With the previous definitions, the concatenation of languages leads to:
Proof: The result is obvious if the sequences are concomitant; if they are not, the result holds by considering
Let
The augmented language generated by a non-terminal symbol
The meaning of an infinite derivation is the following.
An infinite derivation of G is a sequence
There are several ways to define a mapping from the space of words to
with
where
One must notice that the same word with different depth vectors addresses two different curves; this will happen if the grammar G is ambiguous; that is, if the word is the result of two different derivation trees.
In fact, it is generally possible to transform an ambiguous grammar into a non-ambiguous one;
but it is also known that some languages are naturally ambiguous [14]. We can state:
Proof: The following result (see [14] for instance) is necessary to prove this property:
On the one hand, K is obviously injective (or into). According to the fundamental lemma and since G is non-ambiguous, there is
unicity of the depth vector whatever the order of the rules invoked in the derivation.
By considering grammars whose rules have a fixed length, it is also possible to define a mapping which avoids the
re-computation of the figure at every step: new curves are drawn just by replacing old parts by more accurate new ones.
Another significant difference is that the curves drawn can be unconnected, whereas the previous mapping draws only
Jordan curves.
The transition from DOL-systems to grammars can be defined as follows.
Given a DOL-system D = (X, h, s), we consider an auxiliary set N and a one-to-one map
Let us give an example. We consider the grammar
One could hope that the grammar describing a curve was the embedding of the DOL-system one gets by interpolating the curve;
in fact, it is often too general. The following proposition shows the limitation of the embedding:
In particular,
We consider fractal pictures which are recursively segmented into quad-trees according to a correlation criterion;
quad-tree decomposition is the easiest to implement, but the discussion is still valid for other recursive segmentations.
At a given recursion step,
a sub-picture is compared to the elements of the dictionary (the terminal set), and split into smaller sub-pictures if no correlation is found,
as long as allowed by the accuracy; if any relationship is computed, it induces a structure over the leaves of the tree.
This structural information can be added to the rules, so as to lead the parsing step, but can also be considered as a constraint to
the grammatical inference process.
In its current version, the algorithm looks for two major types of structures.
Let G denote a grammar G = (X, N, S, P) with a set of rules P, and x the word being processed.
In that case, the pair of rules
Figure 2 exemplifies the segmentation of a von Koch curve.
An analysis of these results on various types of data will be reported in a future paper.
At the moment, we are focussing on the properties of the inferred quad-tree grammar computed with
the segmentation algorithm discussed above. We expect also to extend the notion of convergence to another
class of fractals which are fractal textures.
I would like to thank the CSIRO Division of Information Technology for supporting this work.
Fractals and Languages
Adherence values of languages, if they exist, might be a powerful tool to describe actual fractal curves;
the generalisation of the previous concept is done in two steps.
Let us consider
converging to u,
being a fractal curve. Obviously, LF(u) includes
every possible sequence converging to u; this is the first step.
We must now consider some "small" distortions upon the original curve.
This can be done by defining a metric between strings, so as to consider finite neighbourhoods of u, whose elements will be the distorted curves.
The Levenstein distance is defined by using three basic operations,
(-etion),
(-ertion) and
(-titution):
the sets built recursively according to:
and
is:
defines a distance.
In [12], the main properties of the induced topology (and others) are detailed
(this is the discrete topology, words are clopens -- closed and open sets simultaneously,
is complete but neither compact nor connected).
of
is realised as previously, except that the endomorphism L is replaced by a
linear function of
written L(n).
is bounded by p,
and
differ by
symbols; that means,
,
with
.
).
One can notice that if
then
whose norm is less than or equal to
,
where m' (resp. m) denotes the index of the letter
in
(resp.
in
); moreover,
denotes
and so is bounded.
Since
and due to the union property of K, there exists r such that
(same thing if one considers right symbols
).
Since
is bounded, then
, and we have:
is
not connected; as a consequence, one can wonder if there exists a straight relationship between the
adherence values of two "close" sequences.
converging to u,v; according to the convergence criterion,
any
is a left-factor of u (resp.
of v) and one can write
(resp.
).
is any possible sequence of the three basic operations used to transform
into
during
the computation of
;
this transformation will be written
briefly (more formally
).
with
is increasing and then there exists m such that
.
For n > m, we consider two cases for
. If
, then
and obviously
;
more generally, this result still stands for a given m' > m if
and
consists only of substitution operations.
.
Since we have
,
this implies that
is transformed either into a prefix of
, or
into a string longer than
plus a prefix of
;
let us suppose the first case (which is not restrictive).
This means that the string
is adjusted to the string
in a number of operations greater than or equal to
;
the operations involved are different from those used in the transformation
,
which implies that
some new letters between the words
and
are found to be equal.
Nevertheless, the condition
must be held, so there is a minimal number
of
operations which must be dedicated to the
transformation of
.
For the same reason, there is also a minimal number
of
operations dedicated to the adjustment of
to the rest
of the string. But the number of operations is bounded by p, and one can see by iterating
this assumption to
that there is a T such that
, either
or
.
The first case means that
, the
second one that
and
.
and
(p finite), we define the
-transformation of order p of
as:
recognises a curve
provided that there exists an increasing sequence in
its
-transform whose mapping converges to
in the Hausdorff metric. In the following, we shall deal only
with languages recognising a finite number of fractal curves - more precisely, l
anguages whose
adherence is finite. The dimension of the language
is then the supremum of the dimensions of the curves recognised by
.
In that case:
a rational language.
Firstly, if
is finite, then its adherence is empty and the dimension is 0; so, let us suppose
to be infinite.
Let
be an element of
such that
.
Since
is rational, then so are its adherence and its centre
; let
be an increasing sequence of words in
whose unique limit is
.
This rational sequence is generated by a p-state automaton, and, according to the star lemma,
.
According to our previous definition of the GM, we can write:
with, in particular:
which can be simplified by writing
.
We get the result by using box-counting.
:
if
and
, then it is not true that
;
one consequence of this is Head's lemma for DOL-systems [13]:
given two DOL-systems
, if
then
.
Very often, this property is not taken into account - for example, for closed curves (see [3, §4-2 and §4-17,]).
It would be interesting to introduce the combination of adherence values so as to supersede this limitation.
, they are concomitant if
;
they are strictly concomitant if for every k,
.
We state:
.
Fractals and Grammars
A grammar is a 4-uple G = (X, N, S, P) where X and N are finite disjoint sets called terminal set
(or alphabet) and non-terminal set, and P is a finite set of re-writing rules,
; a special element
is called the axiom, or starting symbol.
In the following, the usual definition of a derivation is slightly modified in order to map conveniently the words of the language to
.
be
, where
is the cartesian product of n copies of
(with
).
Any generated word x is fitted with a vector
of
; the values of this latter are the depths of the related symbols - that is, the number of rules invoked from the axiom to generate them.
More concisely, a word
is said to be rewritten in
if
all in
, and such that
.
is often written
; successive derivations from
to
are summarised by writing
where the "
" can be also replaced by the number k of steps.
Since
, then
, which are the depth vectors of each factor of x;
if
, then
, with every value of
equal to n+1 and
. For the sake of simplicity,
is omitted when it is not necessary.
is the (finite or infinite) set
,
while the language of that same symbol (that is, the terminal language) is
.
Finally, the language of G equals
and is written
; its dimension is also the dimension of the grammar D(G).
satisfying for every
:
; it is successful if
the length of the longest terminal left-factor of
tends to infinity
with k.
According to [10], there is a unique word u which is the adherence
of
; this will be denoted by:
.
. In particular, it is interesting to show how
some parts of the desired curve are more accurate than others, which means that they are the result of more derivations. The following mapping
is defined from the augmented language of S to
:
denotes the harmonic average of
- that is,
.
so as
to embed D into the grammar
as:
(von Koch) which generates a word
mapped to
; I is the ordered set of rules able to produce w, the polygon map f is such that
.
According to a fundamental lemma, if
, then we can consider v and w separately.
At each derivation step, either
or
can be invoked with the same
probability. It is easy to compute the distribution of the lengths of the curve according to the derivation probabilities, but
one can notice that the maximum length is reached for
- that is, in the case where the rules
are never used. Since the curve has no loops, D(G) is unchanged (
).
(since
).
Grammatical Inference
As a consequence of the previous section, it is necessary to develop a grammatical inference algorithm
based upon the fractal nature of the curves to be processed.
and
, then
is a DOL-structure if:
,
such that
.
and
describes the previous structure.
with
, then (x, u) is a FR-structure if:
,
such that
, with in particular
not in ]l, m[.
Figure 2: Segmentation of a von Koch curve.
Conclusion
A lot of questions arise, some of them theoretical, others more practical. For example,
what is the actual relationship between a DOL-system and the grammar which recognises the same language?
How can these studies on DOL-systems be generalised to TAG-systems or DTOL systems (see for instance [15])?
What is the class of Geometrical Mappings
which leave both the basic properties of the mapping (connectivity) and its fractal dimension invariant?
Is it possible to generate multi-fractal sets by combinations of "mono"-fractal languages?
Acknowledgements