Complexity International
    ISSN 1320-0682     
Volume 4 1997
 

On an Efficient Implementation of Tierra

Russell K. Standish
ACSU,
Division of Information Services
The University of New South Wales
Sydney 2052, Australia

Email: R.Standish@unsw.edu.au
URL: http://parallel.acsu.unsw.edu.au/rks

Abstract:

In a project with which the author is involved, the run-time efficiency of Tierra has become of paramount importance. Whilst some optimisations can be applied to the Tierra code itself, the speedup is at most two to three times, with no benefits from vectorisation or parallelisation being realised due to the structure of the code. A new formulation of Tierra is proposed which gains the computational efficiency of a population model without sacrificing the advantages of an individual model. This new formulation can take advantage of parallelism when available. In some preliminary tests, the new code is up to three orders of magnitude faster than Tierra.


Introduction

Since its inception, Tierra [2] has been of great interest to Artificial Life researchers. It is now being developed to run in a distributed fashion across the Internet [1]. Given the amount of computing power that is being attempted to be harnessed, the efficiency of the code should be considered.

The author is interested in using Tierra as a testbed for measuring embryologies [3]. This involves running Tierra repeatedly with mutation turned off - each time seeded with different pairs of organisms - in order to analyse the ecological interaction of those organisms. The efficiency of Tierra becomes crucial to getting results from this project as the analysis of all organisms generated from a 24-hour Tierra run was terminated before completion after two months of run-time. This paper looks at techniques to improve the run-time efficiency of Tierra.


Simple Optimisations to the Existing Code

Tierran organisms live in the memory of a virtual computer called the soup. Tierra, as implemented, allows for an arbitrary soup size, specified as SoupSize. As the soup is defined with periodic boundary conditions, this implies that every memory operation must be performed modulo SoupSize, which is typically an expensive operation. By restricting SoupSize to a power of two, it is possible to replace the modulo function by a bit-masking operation, which can be done simply by altering the definition of the ad macro as follows:

#define ad(A) (A&(SoupSize-1))
/* #define ad(A) ((A) >=0 ? ((A)%SoupSize) \
               : ((SoupSize-(-(A)%SoupSize))%SoupSize)) */

This change improves Tierra's efficiency by about 50%.

On profiling Tierra, it becomes clear that the code is dominated by the time required to match templates. The algorithm employed in Tierra is to search sequentially through memory locations (either forwards or backwards, or in both directions, depending on the instruction being decoded) up to a distance given by the run-time parameter SearchLimit. This can be improved dramatically by recording the location of templates at the time of cell division. This changes the search operation into a lookup operation involving a few de-references. Table 1 shows the lookup table data structure for the 0080aaa ancestral creature listed in appendixA.

   table21
Table 1     Lookup table structure for the templates in 0080aaa. Size 1 templates have been omitted for clarity

 

Note that not only are templates of size 4 recorded, but so are all subtemplates, to cover the partial matching cases. The storage of subtemplates can be avoided at the cost of a more complex matching algorithm.

The effort of implementing this template-matching algorithm is large, so this has not been done. When profiled on a Silicon Graphics Power Challenge for a SearchLimit of 5, about 50% of the execution time is spent in the template-matching code. Assuming that the proposed algorithm has negligible cost, then the best improvement one could expect from this algorithm is a doubling of performance. Clearly, as the SearchLimit is increased, the advantage gained also increases. This allows the possibility of exploring the effect of higher SearchLimits.

This begs the question of the actual cost to the organism of matching templates. In Tierra, if the matching template is not found within SearchLimit, the instruction is treated as a nop (no operation), and the following instruction is read and executed. In all probability, this means that the organism cannot reproduce in this copy cycle, so on average this is a cost to its reproductive success. Another possibility is that the instruction is retried starting where it left off. This costs the organism a number of instruction cycles given by the distance between its search starting value (the template address in instruction set no. 1) and the matching template, divided by SearchLimit. This latter possibility is physically more appealing, although one could argue that maybe the cost should be quadratic in search distance to model the diffusive nature of chemical templates. This latter possibility can be readily implemented using the new matching algorithm with no additional simulation cost.

 

   figure46
Figure 1      Growth curve for 0080aaa, running on Tierra, and also on miniTierra, with SoupSize=8388608


miniTierra

In Tierra, each distinct organism executes identical code with identical initial conditions (data). This immediately raises the possibility of optimising the calculation by executing a single archetype and keeping track of the population count. Effectively, this would be replacing an individual model with a population model.

 

   figure56

Figure 2     Schematic of matching a template (located at the program counter) to its complementary template

In the Eco-Tierra project [3], a tournament between two Tierran organisms is held to determine the interaction between them. All forms of mutation are switched off so that purely ecological effects are studied. This is an easier situation to implement than a full Tierra implementation as the number of threads remain constant at two provided that all operations are uniquely determined by the CPU state. The only Tierra instructions that are not completely determined by the CPU's state are the template match operations (such as jmp, call, adrf etc.). Consider matching a template (see Figure 2), in the following cases:

In all other cases, the result is indeterminate as it may match multiple templates.

For this work, I have redefined the matching operation to ensure complete determinancy.

This approach eliminates the distinction made on search direction. However, it generates the behaviour correctly for most viable Tierra organisms (which do not have duplicated templates and which have the search direction correctly set to find the matching template in their own code).

As the two organism case is the bottleneck in my project, I have only implemented this simpler case, which I call miniTierra. I have used instruction set 0 for this implementation, although changing the instruction is not difficult as the code is small and well contained. The code for miniTierra is available from http://parallel.acsu.unsw.edu.au/rks/software/miniTierra.tar.gz.

 

   figure100

Figure 3     0080aaa versus 0044aab run on Tierra
 

   figure105

Figure 4     0080aaa versus 0044aab run on miniTierra
 

   figure110

Figure 5     0080aaa versus 0044aab run on miniTierra with smoothing

Figure 1 shows the growth curve for the Tierran ancestral organism 0080aaa. Tierra has been modified so that the reaper will start killing cells before the soup is full [3]. The general shapes of the curves are the same; the difference between miniTierra and Tierra relate to the effect of update order. In Tierra, organisms are being born and are dying all the time so, to some extent, the behaviour can be approximated by the differential equation tex2html_wrap_inline283 . In the case of miniTierra, all the updates are done synchronously, when the code reaches the divide instruction. In this case, the behaviour is more properly described by a difference equation tex2html_wrap_inline285 . During the initial stages of growth, the behaviour is exponential and so the difference between a differential equation and a difference equation is amplified. During the latter stages, the behaviour is dissipative and so the difference is reduced. Both dynamics have the same limit point.

Figure 3 shows the interplay between 0080aaa and 0044aab running Tierra. Figure 4 shows the organisms on miniTierra. The nomenclature is arbitrary, but both organisms are listed in the Appendices. miniTierra does reproduce the limit cycle behaviour seen in Tierra. However, the "sawtooth"-like graph in Figure 4 is caused by births and deaths occurring synchronously rather than uniformly through time. The results of a method to smooth the effects of this synchronisation is shown in Figure 5. Instead of all births and deaths taking place when the divide instruction is called, the net population increase (births-deaths) is divided by the number of instructions executed between divide instructions (the generation time). This value is added to the population every time an instruction is executed. The remainder of the increase is then divided by half the reproduction; this value is then added every second executed instruction. This is repeated until there are no further amounts to be added. For example, consider a population increase of 2013, and a generation time of 817 instructions. Table 2 shows the increments that are applied.

   table123

Table 2     Increments applied for a population increase of 2013

The big difference is time between these two runs. With Tierra (using the ad optimisation), the run took 3834 s, a little more than 1 hour on a 486/75 running Linux. The miniTierra run took 360 ms.


Generalising miniTierra to Arbitrary Numbers of Organisms

When only two organism types are run, the results of a template match is deterministic, with three possible courses of action:

When more than two organisms are run, there arises the additional possibility that more than one match to the template exists and that the match will occur probabilistically according to the proportion of each organism. In effect, the simulator must "fork" the execution thread. The worst case scenario is a thread for each individual in the simulation; this is just the individual model with which we started. A more important issue is the importance attached to the update order as the dynamics depend on this. If the issue is just to have a system that evolves, then the update order presumably doesn't matter much, as an adaptive system will make use of the "physics and chemistry" it has at hand. However, if the issue is to model the existing Tierra accurately, then the population model will not do.

In miniTierra, the emphasis has been on running the organisms exactly to determine ecological parameters. More generally, one would want to determine the effect mutation and flaws (that is, random changes to memory and random execution mistakes). Each of these situations also corresponds to a fork in the execution thread.

Whilst it might be theoretically possible to rejoin a forked thread when two CPUs become identical, in practice this would probably be impossible. Threads could only be reclaimed once an organism becomes extinct.


Listing of 0080aaa

 

format: 3  bits: 2156009669  EXsh    TCsh    TPs     MFs     MTd     MBh     
genotype: 0080aaa  parent genotype: 0666god
1st_daughter:  flags: 0  inst: 827  mov_daught: 80          breed_true: 1
2nd_daughter:  flags: 0  inst: 809  mov_daught: 80          breed_true: 1
Origin: InstExe: 0,0  clock: 0  Thu Jan 01 -5:00:00 1970
MaxPropPop: 0.8306  MaxPropInst: 0.4239 mpp_time: 0,0 
ploidy: 1  track: 0
comments: the ancestor, written by a human, mother of all other creatures.

track 0:

nop1    ; 110 01   0 beginning marker
nop1    ; 110 01   1 beginning marker
nop1    ; 110 01   2 beginning marker
nop1    ; 110 01   3 beginning marker
zero    ; 110 04   4 put zero in cx
not0    ; 110 02   5 put 1 in first bit of cx
shl     ; 110 03   6 shift left cx (cx = 2)
shl     ; 110 03   7 shift left cx (cx = 4)
movDC   ; 110 18   8 move cx to dx (dx = 4)
adrb    ; 110 1c   9 get (backward) address of beginning marker -> ax
nop0    ; 100 00  10 complement to beginning marker
nop0    ; 100 00  11 complement to beginning marker
nop0    ; 100 00  12 complement to beginning marker
nop0    ; 100 00  13 complement to beginning marker
subAAC  ; 110 07  14 subtract cx from ax, result in ax
movBA   ; 110 19  15 move ax to bx, bx now contains start address of mother
adrf    ; 110 1d  16 get (forward) address of end marker -> ax
nop0    ; 100 00  17 complement to end marker
nop0    ; 100 00  18 complement to end marker
nop0    ; 100 00  19 complement to end marker
nop1    ; 100 01  20 complement to end marker
incA    ; 110 08  21 increment ax, to include dummy instruction at end
subCAB  ; 110 06  22 subtract bx from ax to get size, result in cx
nop1    ; 110 01  23 reproduction loop marker
nop1    ; 110 01  24 reproduction loop marker
nop0    ; 110 00  25 reproduction loop marker
nop1    ; 110 01  26 reproduction loop marker
mal     ; 110 1e  27 allocate space (cx) for daughter, address to ax
call    ; 110 16  28 call template below (copy procedure)
nop0    ; 100 00  29 copy procedure complement
nop0    ; 100 00  30 copy procedure complement
nop1    ; 100 01  31 copy procedure complement
nop1    ; 100 01  32 copy procedure complement
divide  ; 110 1f  33 create independent daughter cell
jmpo    ; 110 14  34 jump to template below (reproduction loop)
nop0    ; 100 00  35 reproduction loop complement
nop0    ; 100 00  36 reproduction loop complement
nop1    ; 100 01  37 reproduction loop complement
nop0    ; 100 00  38 reproduction loop complement
ifz     ; 000 05  39 dummy instruction to separate templates
nop1    ; 110 01  40 copy procedure template
nop1    ; 110 01  41 copy procedure template
nop0    ; 110 00  42 copy procedure template
nop0    ; 110 00  43 copy procedure template
pushA   ; 110 0c  44 push ax onto stack
pushB   ; 110 0d  45 push bx onto stack
pushC   ; 110 0e  46 push cx onto stack
nop1    ; 110 01  47 copy loop template
nop0    ; 110 00  48 copy loop template
nop1    ; 110 01  49 copy loop template
nop0    ; 110 00  50 copy loop template
movii   ; 110 1a  51 move contents of [bx] to [ax] (copy one instruction)
decC    ; 110 0a  52 decrement cx (size)
ifz     ; 110 05  53 if cx == 0 perform next instruction, otherwise skip it
jmpo    ; 110 14  54 jump to template below (copy procedure exit)
nop0    ; 110 00  55 copy procedure exit complement
nop1    ; 110 01  56 copy procedure exit complement
nop0    ; 110 00  57 copy procedure exit complement
nop0    ; 110 00  58 copy procedure exit complement
incA    ; 110 08  59 increment ax (address in daughter to copy to)
incB    ; 110 09  60 increment bx (address in mother to copy from)
jmpo    ; 110 14  61 bidirectional jump to template below (copy loop)
nop0    ; 100 00  62 copy loop complement
nop1    ; 100 01  63 copy loop complement
nop0    ; 100 00  64 copy loop complement
nop1    ; 100 01  65 copy loop complement
ifz     ; 000 05  66 this is a dummy instruction to separate templates
nop1    ; 110 01  67 copy procedure exit template
nop0    ; 110 00  68 copy procedure exit template
nop1    ; 110 01  69 copy procedure exit template
nop1    ; 110 01  70 copy procedure exit template
popC    ; 110 12  71 pop cx off stack (size)
popB    ; 110 11  72 pop bx off stack (start address of mother)
popA    ; 110 10  73 pop ax off stack (start address of daughter)
ret     ; 110 17  74 return from copy procedure
nop1    ; 100 01  75 end template
nop1    ; 100 01  76 end template
nop1    ; 100 01  77 end template
nop0    ; 100 00  78 end template
ifz     ; 000 05  79 dummy instruction to separate creature

Listing of 0044aab

 

format: 3  bits: 1  EX      TC      TP      MF      MT      MB      
genotype: 0044aab  parent genotype: 0045aar
1st_daughter:  flags: 0  inst: 467  mov_daught: 44          breed_true: 1
2nd_daughter:  flags: 0  inst: 449  mov_daught: 44          breed_true: 1
Origin: InstExe: 37,275744  clock: 830995468  Thu May  2 10:04:28 1996
MaxPropPop: 0.8392  MaxPropInst: 0.1184 mpp_time: 9,998691 
ploidy: 1

track 0:

nop1    ; 000 01   0 beginning marker
nop1    ; 000 01   1 beginning marker
nop1    ; 000 01   2 beginning marker
nop1    ; 000 01   3 beginning marker
zero    ; 000 04   4 put zero in cx
not0    ; 000 02   5 put 1 in first bit of cx
shl     ; 000 03   6 shift left cx (cx = 2)
shl     ; 000 03   7 shift left cx (cx = 4)
movDC   ; 000 18   8 move cx to dx (dx = 4)
adrb    ; 000 1c   9 get (backward) address of beginning marker -> ax
nop0    ; 000 00  10 complement to beginning marker
nop0    ; 000 00  11 complement to beginning marker
nop0    ; 000 00  12 complement to beginning marker
nop0    ; 000 00  13 complement to beginning marker
subAAC  ; 000 07  14 subtract cx from ax, result in ax
movBA   ; 000 19  15 move ax to bx, bx now contains start address of mother
adrf    ; 000 1d  16 get (forward) address of end marker -> ax
nop0    ; 000 00  17 complement to end marker
nop0    ; 000 00  18 complement to end marker
nop0    ; 000 00  19 complement to end marker
nop1    ; 000 01  20 complement to end marker
incA    ; 000 08  21 increment ax, to include dummy instruction at end
subCAB  ; 000 06  22 subtract bx from ax to get size, result in cx
nop1    ; 000 01  23 reproduction loop marker
nop1    ; 000 01  24 reproduction loop marker
nop0    ; 000 00  25 reproduction loop marker
nop1    ; 000 01  26 reproduction loop marker
mal     ; 000 1e  27 allocate space (cx) for daughter, address to ax
call    ; 000 16  28 call template below (copy procedure in 0080aaa)
nop0    ; 000 00  29 copy procedure complement
nop0    ; 000 00  30 copy procedure complement
nop1    ; 000 01  31 copy procedure complement
nop1    ; 000 01  32 copy procedure complement
divide  ; 000 1f  33 create independent daughter cell
jmpo    ; 000 14  34 jump to template below (reproduction loop)
nop0    ; 000 00  35 reproduction loop complement (will match last)
nop1    ; 000 01  36 reproduction loop complement (three instrustions)
nop0    ; 000 00  37 reproduction loop complement (of repro loop)
ifz     ; 000 05  38 dummy instruction to separate templates
nop1    ; 000 01  39 end template
nop1    ; 000 01  40 end template
nop1    ; 000 01  41 end template
nop0    ; 000 00  42 end template
pushA   ; 000 0c  43 dummy instruction to separate creature

References

 

1
Ray, T. S.
URL: A proposal to create two biodiversity reserves: One digital and one organic.
See ftp://tierra.slhs.udel.edu/tierra/doc/res rves.tex, http://www.hip.atr.co.jp/~ray/pubs reserves/reserves.html
Also see New Scientist, 150(2034): 32-35.

 

2
Ray, T. S. (1991) An approach to the synthesis of life, in Artificial Life II, (eds) C. G. Langton, C. Taylor, J. D. Farmer & S. Rasmussen, p. 371, Addison-Wesley, New York.

 

3
Standish, R. K. (1996) Ecolab: Where to now? in Complex Systems: From Local Interaction to Global Phenomena, (eds) R. Stocker, H. Jelinek, B. Durnota & T. Bossomaier, IOS Press.
URL: Complexity International, vol. 3

 


Complexity International (1997) 4