Building Forth Structures

Jack Brien


Anton Ertl has made many contributions to Forth - GForth, the RAFTS native compiler technology and an object-oriented extension for ANS Forths which makes use of some words for building data structures. In a collaboration with Anton, Jack Brien has written about the data structures with an interesting example and some variations of his own.

Starting Simply

M any Forths do not have a feature like C's struct or Pascal's RECORD but it is simple to provide one if needed, and a good example of how Forth can be extended to provide facilities that are not standard. There are many approaches, depending on how much complexity your application needs. A very basic one, which I have used a lot, is:

: field  \ n ++ n ; declare n as an offset into a record
    dup constant ;

used as in

0 field A 5 chars +  \ size of field A
  field B 3 cells +  \ size of field B
  field C 2 cells +  \ size of field C

The words created by FIELD store their own offset, to which the same of the field is then added to calculate the next offset, leaving the total size of the record on the stack at the end. This can be stored in a constant or used to create a named record directly:

constant #FOO

#foo create FOO allot

The address of (for example) field B in FOO can be found with: foo b +.

Since we always use the offsets with +, using a defining word field that includes the + in the action of the defined word offers itself. (Notice how I have factored out the action of FIELD into a separate word. This is not strictly necessary, but can be useful, as we will see later):

: DOFIELD \ the action of a field 
DOES>     \ addr1 -- addr2 ; calculate address of field
    @ + ;

: FIELD \ # n ++ #'   \ define a field with offset # and size n
    create over , +    \ store offset, add the size to find new offset
    dofield ;         \ set the action of the newly created field 

0
5 chars field A
3 cells field B 
2 cells field C

create FOO allot

Instead of b +, we now simply write b so the address of field B in FOO is now found with: foo b .

Structures can be nested:

0
some fields
#foo field D
more fields

create BAR allot

And bar d b returns the address of field B in field D of BAR.

The Alignment Problem

Some Forths have alignment requirements for CELLS FLOATS and DFLOATS such as requiring CELL-based data to begin at even-numbered addresses. In this case FOO above is incorrect, since the cell data at offset B may be at an odd address. One way round this is to start the structure with the data that has the greatest alignment requirement, and ensure that the whole structure is aligned to suit it.

The Standard words that do this are: ALIGN FALIGN DFALIGN and SFALIGN which align the data space pointer correctly for cells, floats, double floats and single floats respectively. The related words ALIGNED FALIGNED DFALIGNED SFALIGNED each take a address and return the next address aligned to the required standard. You could use these to ensure your structures are correct by (for example) using aligned before the declaration of field B, but this could get tedious when dealing with complex structures.

Anton Ertl has devised this solution: each type descriptor includes its alignment requirement as well as its size.

1 chars     0         2constant STRUCT  \ used to start a  structure

1 aligned   1 cells   2constant CELL%   \ The % indicates an alignment:size
1 chars     1 chars   2constant CHAR%   \ pair passed on the Data Stack.
1 faligned  1 floats  2constant FLOAT%
1 dfaligned 1 dfloats 2constant DFLOAT%
1 sfaligned 1 sfloats 2constant SFLOAT%

cell% 2* 2constant DOUBLE%   

Use this technique to make descriptors for arrays. For a field containing a string of 10 chars, you might use:

char% 10 * 2constant STRING%

: NALIGN \ addr n -- addr' ; align address to standard n
  1- tuck + swap invert and ;

: ENDSTRUCT \ align size ++ ; declare a type descriptor for the structure
      over nalign \ pad size to full alignment
      2constant ;

: FIELD, \ align1 offset1 align size --  align2 offset2 ; 
          swap rot over nalign dup , \ compile aligned offset
         rot +                       \ add size to make new offset
       >r nalign r> ;                \ correct alignment requirement
  
: FIELD \ align1 offset1 align size ++  align2 offset2 ; 
  create field, dofield ;

FIELD has been changed to use the 2 values left by the type descriptor. STRUCT is used (instead of the zero we were using so far) to begin the declaration of a structure. Where we wrote constant #foo to save the size of the structure, we now use endstruct foo%. FOO% can then be used in a nested structure. It is even possible to have nested arrays of structures, e.g.

foo% 20 * field 20FOO-ARRAY

Anton also supplies foo% %alloc and foo% %allot as convenient ways to create a dataspace to hold structure FOO%.

: %ALLOT \ align size -- addr ; reserve dataspace for structure
      here rot nalign \ get aligned address
      dup here - rot + allot ;

: %ALLOC \ align size -- addr ; reserve heapspace
       nip allocate  throw ;    

Note that in ANS Forth the body of a created word is aligned but not necessarily faligned; therefore, if foo% requires greater alignment create foo foo% allot drop may not produce the desired effect. You need to do foo% %allot constant foo. The address returned by ALLOCATE is already aligned to the highest standard, so the same precaution is not needed there.

Anton's naming convention

The field names that come to (my) mind are often quite generic and, if used, would cause frequent name clashes. E.g. many structures probably contain a counter field. The structure names that come to (my) mind are often also the logical choice for the names of words that create such a structure.

Therefore, I have adopted the following naming conventions:

  • The names of fields are of the form struct-field, where struct is the basic name of the structure, and field is the basic name of the field. You can think about field words as converting the (address of the) structure into the (address of the) field.
  • The names of structures are of the form struct%, where struct is the basic name of the structure.

Do we need to name the first field?

The first field is at the base address of a structure and the word for this field actually does not change the address on the stack. You may be tempted to leave it out in the interest of run-time and space efficiency. However, it is possible to reduce the temptation:

: DONOTHING   \ makes the last created word do nothing
immediate     \ do it at compile-time, so no code is generated
DOES> drop ;  \ drop the address that create leaves

Anton makes field perform donothing (and therefore compile no code) if the offset on the stack is zero. A different approach is to use another word for the first field:

: FIRST-FIELD \ align size ++ align size ; declare the first field
  create donothing ;

STRUCT is now redundant, so we can rename ENDSTRUCT as STRUCT. e.g.

Create a cell-sized structure to hold a pointer:

cell% struct POINTER%

Use it to create a structure that holds two pointers:

pointer% first-field &BRANCH-LEFT
pointer% field  &BRANCH-RIGHT 
struct BRANCH%         

&name is simply my convention for naming a pointer.

Lists, Trees and other Generic Structures

BRANCH%es can be used to construct binary trees, but they are not much use without data. This is a characteristic of all generic structures such as lists, trees, stacks and queues. Sorted binary trees for example, can be built easily with any structure whose first field is a BRANCH%:

branch% 
cell% field CELL-BRANCH-DATA
struct CELL-BRANCH%

branch% 
foo%  field FOO-BRANCH-DATA
struct FOO-BRANCH%

In each case the field has the same function - to return the base address of the structure that the branch actually contains. CELL-BRANCH-DATA and FOO-BRANCH-DATA are identical apart from the name. You could equally well define the generic:

pointer% first-field &BRANCH-LEFT
pointer% field  &BRANCH-RIGHT  
cell 0 field BRANCH-DATA  \ just a placemarker
struct BRANCH%

Now, since BRANCH-DATA returns the offset as the following data field, we do not actually want to name the data field again. Instead we want a syntax like:

branch% foo% +struct FOO-BRANCH% \ to build a tree of foo%s

Which is easily defined:

: +STRUCT  \ generic% data% ++ ; 
      rot +     \ add the data sizes
      nip       \ use the generic alignment
      struct ;   

So you see that the placemarker in BRANCH% or any similar definition must have an alignment requirement suitable for whatever data type will follow. So to allow branches containing dfloats, the definition would be:

1 dfaligned 0 field BRANCH-DATA

The root of a tree is a pointer to a branch. &left-branch and &right-branch point to other branches, or else contain zero.

COMPARE-DATA \ branch-data1 branch-data2 -- f ; true if data1 < data2 
\ definition depends on the structure at branch-data


: ?L/R  \ branch f -- branch' ;
      IF &left-branch ELSE &right-branch THEN ; 

: TREE-INSERT \ branch tree  -- ;
\ insert branch at correct point in tree (ignoring duplicates) 
      over branch-data swap    \ branch1 branch-data1 &branch2
        BEGIN dup @ WHILE      \ not on leaf of tree
        @ 2dup branch-data compare-data ?l/r REPEAT 
        nip ! ;
   

This definition would work for any type of tree, were it not for the different versions of COMPARE-DATA required. COMPARE-DATA is what is an object-orientated language calls a method, and object-oriented code is basically a way of making each structure ensure that the correct method is called. That is more complicated than what we have seen so far, but not by very much.

References

The original simple version can be found in Forthwrite Issue 64.

Chris Jakeman presented a comprehensive set of Array and Record words in Forthwrite Issue 55, developed from code in Dick Pountain's book Object-Oriented Forth (1987), which can also be borrowed from the FIG UK Library

Anton's Object Package can be downloaded from http://www.complang.tuwien.ac.at/forth/objects.zip.
This contains a complete object-oriented system based on the principles given here.

I am indebted to Anton for his advice in writing this article.


Jack Brien has been writing Forth as a hobby for the last ten years, originally on an Amstrad PCW. The rest of his time is spent researching the history of Fermanagh. Reach him on jackbrien@zetnet.co.uk or 01365 388253.