|
ISSN 1320-0682 | ||||
| Volume 4 | 1997 | ||||
Since its inception, Tierra [2] has been of great interest to Artificial Life researchers. It is now being developed to run in a distributed fashion across the Internet [1]. Given the amount of computing power that is being attempted to be harnessed, the efficiency of the code should be considered.
The author is interested in using Tierra as a testbed for measuring embryologies [3]. This involves running Tierra repeatedly with mutation turned off - each time seeded with different pairs of organisms - in order to analyse the ecological interaction of those organisms. The efficiency of Tierra becomes crucial to getting results from this project as the analysis of all organisms generated from a 24-hour Tierra run was terminated before completion after two months of run-time. This paper looks at techniques to improve the run-time efficiency of Tierra.
Tierran organisms live in the memory of a virtual computer called the
soup. Tierra, as implemented, allows for an arbitrary soup size, specified
as SoupSize. As the soup is defined with periodic boundary
conditions, this implies that every memory operation must be performed modulo
SoupSize, which is typically an expensive operation. By restricting
SoupSize to a power of two, it is possible to replace the modulo
function by a bit-masking operation, which can be done simply by altering the
definition of the ad macro as follows:
#define ad(A) (A&(SoupSize-1))
/* #define ad(A) ((A) >=0 ? ((A)%SoupSize) \
: ((SoupSize-(-(A)%SoupSize))%SoupSize)) */
This change improves Tierra's efficiency by about 50%.
On profiling Tierra, it becomes clear that the code is dominated by the time
required to match templates. The algorithm employed in Tierra is to search
sequentially through memory locations (either forwards or backwards, or in both
directions, depending on the instruction being decoded) up to a distance given
by the run-time parameter SearchLimit. This can be improved
dramatically by recording the location of templates at the time of cell
division. This changes the search operation into a lookup operation involving a
few de-references. Table
1 shows the lookup table data structure for the 0080aaa ancestral creature
listed in appendixA.
Table 1 Lookup table structure for the
templates in 0080aaa. Size 1 templates have been omitted for clarity
Note that not only are templates of size 4 recorded, but so are all subtemplates, to cover the partial matching cases. The storage of subtemplates can be avoided at the cost of a more complex matching algorithm.
The effort of implementing this template-matching algorithm is large, so this
has not been done. When profiled on a Silicon Graphics Power Challenge for a
SearchLimit of 5, about 50% of the execution time is spent in the
template-matching code. Assuming that the proposed algorithm has negligible
cost, then the best improvement one could expect from this algorithm is a
doubling of performance. Clearly, as the SearchLimit is increased,
the advantage gained also increases. This allows the possibility of exploring
the effect of higher SearchLimits.
This begs the question of the actual cost to the organism of matching
templates. In Tierra, if the matching template is not found within
SearchLimit, the instruction is treated as a nop (no operation),
and the following instruction is read and executed. In all probability, this
means that the organism cannot reproduce in this copy cycle, so on average this
is a cost to its reproductive success. Another possibility is that the
instruction is retried starting where it left off. This costs the organism a
number of instruction cycles given by the distance between its search starting
value (the template address in instruction set no. 1) and the matching template,
divided by SearchLimit. This latter possibility is physically more
appealing, although one could argue that maybe the cost should be quadratic in
search distance to model the diffusive nature of chemical templates. This latter
possibility can be readily implemented using the new matching algorithm with no
additional simulation cost.
Figure 1 Growth curve for 0080aaa, running
on Tierra, and also on miniTierra, with SoupSize=8388608
In Tierra, each distinct organism executes identical code with identical initial conditions (data). This immediately raises the possibility of optimising the calculation by executing a single archetype and keeping track of the population count. Effectively, this would be replacing an individual model with a population model.
Figure 2 Schematic of matching a template
(located at the program counter) to its complementary template
In the Eco-Tierra project [3], a tournament between two Tierran organisms is held to determine the interaction between them. All forms of mutation are switched off so that purely ecological effects are studied. This is an easier situation to implement than a full Tierra implementation as the number of threads remain constant at two provided that all operations are uniquely determined by the CPU state. The only Tierra instructions that are not completely determined by the CPU's state are the template match operations (such as jmp, call, adrf etc.). Consider matching a template (see Figure 2), in the following cases:
In all other cases, the result is indeterminate as it may match multiple templates.
For this work, I have redefined the matching operation to ensure complete determinancy.
This approach eliminates the distinction made on search direction. However, it generates the behaviour correctly for most viable Tierra organisms (which do not have duplicated templates and which have the search direction correctly set to find the matching template in their own code).
As the two organism case is the bottleneck in my project, I have only
implemented this simpler case, which I call miniTierra. I have used
instruction set 0 for this implementation, although changing the instruction is
not difficult as the code is small and well contained. The code for miniTierra
is available from
http://parallel.acsu.unsw.edu.au/rks/software/miniTierra.tar.gz.
Figure 3 0080aaa versus 0044aab run
on Tierra
Figure 4 0080aaa versus 0044aab run
on miniTierra
Figure 5 0080aaa versus 0044aab run
on miniTierra with smoothing
Figure
1 shows the growth curve for the Tierran ancestral organism 0080aaa. Tierra
has been modified so that the reaper will start killing cells before the soup is
full [3].
The general shapes of the curves are the same; the difference between miniTierra
and Tierra relate to the effect of update order. In Tierra, organisms are being
born and are dying all the time so, to some extent, the behaviour can be
approximated by the differential equation
. In the case
of miniTierra, all the updates are done synchronously, when the code reaches the
divide instruction. In this case, the behaviour is more properly
described by a difference equation
. During the
initial stages of growth, the behaviour is exponential and so the difference
between a differential equation and a difference equation is amplified. During
the latter stages, the behaviour is dissipative and so the difference is
reduced. Both dynamics have the same limit point.
Figure 3 shows the interplay between 0080aaa and 0044aab running Tierra. Figure 4 shows the organisms on miniTierra. The nomenclature is arbitrary, but both organisms are listed in the Appendices. miniTierra does reproduce the limit cycle behaviour seen in Tierra. However, the "sawtooth"-like graph in Figure 4 is caused by births and deaths occurring synchronously rather than uniformly through time. The results of a method to smooth the effects of this synchronisation is shown in Figure 5. Instead of all births and deaths taking place when the divide instruction is called, the net population increase (births-deaths) is divided by the number of instructions executed between divide instructions (the generation time). This value is added to the population every time an instruction is executed. The remainder of the increase is then divided by half the reproduction; this value is then added every second executed instruction. This is repeated until there are no further amounts to be added. For example, consider a population increase of 2013, and a generation time of 817 instructions. Table 2 shows the increments that are applied.
Table 2 Increments applied for a population
increase of 2013
The big difference is time between these two runs. With Tierra (using the
ad optimisation), the run took 3834 s, a little more than 1 hour on
a 486/75 running Linux. The miniTierra run took 360 ms.
When only two organism types are run, the results of a template match is deterministic, with three possible courses of action:
When more than two organisms are run, there arises the additional possibility that more than one match to the template exists and that the match will occur probabilistically according to the proportion of each organism. In effect, the simulator must "fork" the execution thread. The worst case scenario is a thread for each individual in the simulation; this is just the individual model with which we started. A more important issue is the importance attached to the update order as the dynamics depend on this. If the issue is just to have a system that evolves, then the update order presumably doesn't matter much, as an adaptive system will make use of the "physics and chemistry" it has at hand. However, if the issue is to model the existing Tierra accurately, then the population model will not do.
In miniTierra, the emphasis has been on running the organisms exactly to determine ecological parameters. More generally, one would want to determine the effect mutation and flaws (that is, random changes to memory and random execution mistakes). Each of these situations also corresponds to a fork in the execution thread.
Whilst it might be theoretically possible to rejoin a forked thread when two CPUs become identical, in practice this would probably be impossible. Threads could only be reclaimed once an organism becomes extinct.
format: 3 bits: 2156009669 EXsh TCsh TPs MFs MTd MBh genotype: 0080aaa parent genotype: 0666god 1st_daughter: flags: 0 inst: 827 mov_daught: 80 breed_true: 1 2nd_daughter: flags: 0 inst: 809 mov_daught: 80 breed_true: 1 Origin: InstExe: 0,0 clock: 0 Thu Jan 01 -5:00:00 1970 MaxPropPop: 0.8306 MaxPropInst: 0.4239 mpp_time: 0,0 ploidy: 1 track: 0 comments: the ancestor, written by a human, mother of all other creatures. track 0: nop1 ; 110 01 0 beginning marker nop1 ; 110 01 1 beginning marker nop1 ; 110 01 2 beginning marker nop1 ; 110 01 3 beginning marker zero ; 110 04 4 put zero in cx not0 ; 110 02 5 put 1 in first bit of cx shl ; 110 03 6 shift left cx (cx = 2) shl ; 110 03 7 shift left cx (cx = 4) movDC ; 110 18 8 move cx to dx (dx = 4) adrb ; 110 1c 9 get (backward) address of beginning marker -> ax nop0 ; 100 00 10 complement to beginning marker nop0 ; 100 00 11 complement to beginning marker nop0 ; 100 00 12 complement to beginning marker nop0 ; 100 00 13 complement to beginning marker subAAC ; 110 07 14 subtract cx from ax, result in ax movBA ; 110 19 15 move ax to bx, bx now contains start address of mother adrf ; 110 1d 16 get (forward) address of end marker -> ax nop0 ; 100 00 17 complement to end marker nop0 ; 100 00 18 complement to end marker nop0 ; 100 00 19 complement to end marker nop1 ; 100 01 20 complement to end marker incA ; 110 08 21 increment ax, to include dummy instruction at end subCAB ; 110 06 22 subtract bx from ax to get size, result in cx nop1 ; 110 01 23 reproduction loop marker nop1 ; 110 01 24 reproduction loop marker nop0 ; 110 00 25 reproduction loop marker nop1 ; 110 01 26 reproduction loop marker mal ; 110 1e 27 allocate space (cx) for daughter, address to ax call ; 110 16 28 call template below (copy procedure) nop0 ; 100 00 29 copy procedure complement nop0 ; 100 00 30 copy procedure complement nop1 ; 100 01 31 copy procedure complement nop1 ; 100 01 32 copy procedure complement divide ; 110 1f 33 create independent daughter cell jmpo ; 110 14 34 jump to template below (reproduction loop) nop0 ; 100 00 35 reproduction loop complement nop0 ; 100 00 36 reproduction loop complement nop1 ; 100 01 37 reproduction loop complement nop0 ; 100 00 38 reproduction loop complement ifz ; 000 05 39 dummy instruction to separate templates nop1 ; 110 01 40 copy procedure template nop1 ; 110 01 41 copy procedure template nop0 ; 110 00 42 copy procedure template nop0 ; 110 00 43 copy procedure template pushA ; 110 0c 44 push ax onto stack pushB ; 110 0d 45 push bx onto stack pushC ; 110 0e 46 push cx onto stack nop1 ; 110 01 47 copy loop template nop0 ; 110 00 48 copy loop template nop1 ; 110 01 49 copy loop template nop0 ; 110 00 50 copy loop template movii ; 110 1a 51 move contents of [bx] to [ax] (copy one instruction) decC ; 110 0a 52 decrement cx (size) ifz ; 110 05 53 if cx == 0 perform next instruction, otherwise skip it jmpo ; 110 14 54 jump to template below (copy procedure exit) nop0 ; 110 00 55 copy procedure exit complement nop1 ; 110 01 56 copy procedure exit complement nop0 ; 110 00 57 copy procedure exit complement nop0 ; 110 00 58 copy procedure exit complement incA ; 110 08 59 increment ax (address in daughter to copy to) incB ; 110 09 60 increment bx (address in mother to copy from) jmpo ; 110 14 61 bidirectional jump to template below (copy loop) nop0 ; 100 00 62 copy loop complement nop1 ; 100 01 63 copy loop complement nop0 ; 100 00 64 copy loop complement nop1 ; 100 01 65 copy loop complement ifz ; 000 05 66 this is a dummy instruction to separate templates nop1 ; 110 01 67 copy procedure exit template nop0 ; 110 00 68 copy procedure exit template nop1 ; 110 01 69 copy procedure exit template nop1 ; 110 01 70 copy procedure exit template popC ; 110 12 71 pop cx off stack (size) popB ; 110 11 72 pop bx off stack (start address of mother) popA ; 110 10 73 pop ax off stack (start address of daughter) ret ; 110 17 74 return from copy procedure nop1 ; 100 01 75 end template nop1 ; 100 01 76 end template nop1 ; 100 01 77 end template nop0 ; 100 00 78 end template ifz ; 000 05 79 dummy instruction to separate creature
format: 3 bits: 1 EX TC TP MF MT MB genotype: 0044aab parent genotype: 0045aar 1st_daughter: flags: 0 inst: 467 mov_daught: 44 breed_true: 1 2nd_daughter: flags: 0 inst: 449 mov_daught: 44 breed_true: 1 Origin: InstExe: 37,275744 clock: 830995468 Thu May 2 10:04:28 1996 MaxPropPop: 0.8392 MaxPropInst: 0.1184 mpp_time: 9,998691 ploidy: 1 track 0: nop1 ; 000 01 0 beginning marker nop1 ; 000 01 1 beginning marker nop1 ; 000 01 2 beginning marker nop1 ; 000 01 3 beginning marker zero ; 000 04 4 put zero in cx not0 ; 000 02 5 put 1 in first bit of cx shl ; 000 03 6 shift left cx (cx = 2) shl ; 000 03 7 shift left cx (cx = 4) movDC ; 000 18 8 move cx to dx (dx = 4) adrb ; 000 1c 9 get (backward) address of beginning marker -> ax nop0 ; 000 00 10 complement to beginning marker nop0 ; 000 00 11 complement to beginning marker nop0 ; 000 00 12 complement to beginning marker nop0 ; 000 00 13 complement to beginning marker subAAC ; 000 07 14 subtract cx from ax, result in ax movBA ; 000 19 15 move ax to bx, bx now contains start address of mother adrf ; 000 1d 16 get (forward) address of end marker -> ax nop0 ; 000 00 17 complement to end marker nop0 ; 000 00 18 complement to end marker nop0 ; 000 00 19 complement to end marker nop1 ; 000 01 20 complement to end marker incA ; 000 08 21 increment ax, to include dummy instruction at end subCAB ; 000 06 22 subtract bx from ax to get size, result in cx nop1 ; 000 01 23 reproduction loop marker nop1 ; 000 01 24 reproduction loop marker nop0 ; 000 00 25 reproduction loop marker nop1 ; 000 01 26 reproduction loop marker mal ; 000 1e 27 allocate space (cx) for daughter, address to ax call ; 000 16 28 call template below (copy procedure in 0080aaa) nop0 ; 000 00 29 copy procedure complement nop0 ; 000 00 30 copy procedure complement nop1 ; 000 01 31 copy procedure complement nop1 ; 000 01 32 copy procedure complement divide ; 000 1f 33 create independent daughter cell jmpo ; 000 14 34 jump to template below (reproduction loop) nop0 ; 000 00 35 reproduction loop complement (will match last) nop1 ; 000 01 36 reproduction loop complement (three instrustions) nop0 ; 000 00 37 reproduction loop complement (of repro loop) ifz ; 000 05 38 dummy instruction to separate templates nop1 ; 000 01 39 end template nop1 ; 000 01 40 end template nop1 ; 000 01 41 end template nop0 ; 000 00 42 end template pushA ; 000 0c 43 dummy instruction to separate creature