On This Page
The data field of a Forth word is contained in the data space.
Since code and/or word names may also be in the same physical space, the data field of each word must be thought of as discrete. The next unused data address is pointed to by the data space pointer (DSP), and returned by HERE. The smallest amount of memory that can be physically addressed is one address unit The smallest amount Forth addresses is one CHAR, which may be as small as a byte (the usual size) or as large as a cell.
A char-aligned address (c-addr) is one at which a char can be stored.
An aligned address (a-addr) is one at which a whole cell can be stored.
A few more instructions have had to be added to the pseudo-assembler:
! x a-addr --; store value x at a-addr POP PSP TO [TOS] POP PSP TO TOS NEXT +! n a-addr --; increment value at a-addr by n POP PSP +TO [TOS] POP PSP TO TOS NEXT @ a-addr -- x ; fetch value at a-addr [TOS] TO TOS NEXT 2! x1 x2 a-addr --; store x2 at a-addr, x1 in next cell POP PSP TO [TOS] POP PSP TO [TOS CELL+] POP PSP TO TOS NEXT 2@ a-addr -- x1 x2 ; fetch the pair at a-addr [TOS CELL+] PUSH PSP [TOS] TO TOS NEXT C! char c-addr --; store char at c-addr POP PSP CTO [TOS] POP PSP TO TOS NEXT if char < cell, a different instruction than TO is needed C@ c-addr -- char ; fetch char at c-addr [TOS] (char) TO TOS NEXT (char) sits in the least significant bits of TOS All other bits are set to zero FILL c-addr u char -- ; fill memory from c-addr with u chars POP PSP TIMES TOS CTO [PSP] CHAR +TO PSP LOOP POP PSP TO W POP PSP TO TOS NEXT CMOVE c-addr1 c-addr2 u -- ; move u chars from c-addr1 to c-addr2 POP PSP TO W TOS TIMES [PSP] (char) CTO W 1 +TO [PSP] 1 +TO W LOOP POP PSP TO W POP PSP TO TOS NEXT CMOVE> c-addr1 c-addr2 u -- ; move u chars from c-addr1 to c-addr2 POP PSP TO W TOS +TO [PSP] TOS +TO W \ start from end of block TOS TIMES [PSP] (char) CTO W -1 +TO [PSP] -1 +TO W LOOP POP PSP TO W POP PSP TO TOS NEXTThe optional String words CMOVE and CMOVE> work with chars and do not check for overlap. CMOVE starts copying at the start of the block, CMOVE> at the end. Sometimes overlap is what you want. The Core Word MOVE works with address units, not chars, and avoids overlap. In the common case where char=address unit it can be defined in high level thus:
: MOVE >R 2DUP SWAP R@ + WITHIN IF R> CMOVE> ELSE R> CMOVE THEN ;
On 16-bit Forths that address more than 64K, it has been the custom to have variants of these words which take 32-bit (two stack items) addresses. These are suffixed with L thus: @L , +!L , 2!L MOVEL , etc.
The Data Space Pointer is a-aligned automatically after CREATE. Because code and data may share the same physical space, the Standard requires ; to a-align data space too. However, it is better to use ALIGN anyway, just in case.
Many systems (e.g. 8-bit and 16-bit DOS) have no alignment requirements. You may still care to define ALIGN and ALIGNED as no-ops, in case you ever want to port code. Even without portability considerations, CELL+, CELLS and CHARS are useful for readability
: ALIGNED \ addr -- a-addr address-units-per-cell MOD + ; : ALIGN HERE ALIGNED DSP ! ; : ALLOT \ n -- ; allot n address units of data space DSP +! ; : CELL+ \ a-addr -- a-addr'; increment by one cell address-units-per-cell + ; : CELLS \ n -- n' ; the width of n cells address-units-per-cell * ; : CHAR+ \ c-addr -- c-addr' ; increment by one char address-units-per-char + ; : CHARS \ n -- n' ; the width of n chars address-units-per-char * ; : COUNT \ c-addr -- c-addr' char ; \ fetch char at c-addr and increment address DUP CHAR+ SWAP C@ ;
CELL+ CELLS and CHAR+ CHARS and COUNT would normally be primitives.
Stack manipulation is a major factor in Forth code, so these words, or at least the most commonly used ones, need to be fast. As mentioned before, caching the top value on the stack in a TOS makes many primitive definitions faster. It's generally not worthwhile trying to cache more than one stack item, but with a highly optimised native code Forth it is possible to hold several stack items in registers. In that case the stack manipulating words become complex immediate words which decide at compile time exactly what code is compiled. That, however is beyond my present scope.
In these and all further pseudocode definitions, I am assuming TOS is used. Here are all the Core Standard stack manipulation words. For simplicity, both stack pointer registers are assumed to have indexing capability and both stacks to grow downwards. Neither of these assumptions is vital.
2DROP (a b --) POP PSP TO W POP PSP TO TOS NEXT 2DUP (a b -- a b a b) [PSP] TO W TOS PUSH PSP W PUSH PSP NEXT 2OVER (a b c d -- a b c d a b) TOS PUSH PSP [PSP 3 CELLS +] PUSH PSP [PSP 3 CELLS +] TO TOS NEXT 2SWAP (a b c d -- c d a b) [PSP CELL +] TO W TOS TO [PSP CELL +] W TO TOS [PSP 2 CELLS +] TO W [PSP] TO [PSP 2 CELLS +] W TO PSP NEXT ?DUP (x -- 0 | x x ) duplicate if not zero TOS 0<> IF TOS PUSH PSP NEXT DROP (a -- ) POP PSP TO TOS NEXT DUP (a -- a a) TOS PUSH TO PSP NEXT OVER (a b -- a b a) [PSP] TO W TOS PUSH PSP W TO TOS NEXT ROT (a b c -- b c a) [PSP] TO W TOS TO [PSP] [PSP CELL+] TO TOS W TO [PSP CELL+] NEXT SWAP (a b -- b a) POP PSP TO W TOS PUSH PSP W PUSH PSP NEXT
NIP (a b -- b) POP PSP TO W NEXT TUCK (a b -- b a b) POP PSP TO W TOS PUSH PSP W PUSH PSP NEXT PICK ( xn ...x1 n -- xn...x1 xn ) [PSP TOS 1- CELLS +] TO TOS NEXT ROLL ( xn x(n-1)...x1 n -- x(n-1)...x1 xn )
ROLL is to PICK as SWAP is to OVER. It's an ugly beast both to code and to use. Leave it out unless you have an application that needs to treat the stack like an array.
THIRD (a b c -- a b c a) TOS PUSH PSP [PSP 2 CELLS +] TO TOS NEXT UNDER+ (a b c -- a+c b) TOS +TO [PSP CELL +] POP PSP TO TOS NEXT
>R POP PSP PUSH RSP NEXT R> POP RSP PUSH PSP NEXT R@ [RSP] PUSH PSP NEXT
Manipulating the RSP can produce some strange and sometimes useful results.
For example, an
R> DROP that is not balanced by a preceding
>R will cause the current word when it returns, to skip the
rest of the word which called it. That sort of behaviour is not portable.
There is no Standard way of manipulating return addresses. It follows that
R> etc. need not reference RSP at all - in fact, you may
consider it safer if they don't. To avoid confusion, I shall refer to the
stack used by the R* words as the "rack". These are the rules for portable
use of the rack:
Your own private code may of course play fast and loose with these and any other portability rules.
2>R (= SWAP >R >R) POP PSP PUSH RSP TOS PUSH RSP POP PSP TO TOS NEXT 2R> (= R> R> SWAP) TOS PUSH PSP POP RSP TO W POP RSP PUSH PSP W TO TOS NEXT 2R@ = 2R> 2DUP 2>R TOS PUSH PSP [RSP] TO W [RSP CELL +] PUSH PSP W TO TOS NEXT
Useful non-Standard words:DUP>R TOS PUSH RSP NEXT R>DROP W POP RSP NEXT RPICK ( i -- x ; i'th item from return stack) [RSP TOS +] TO TOS NEXT