The Data Space

! +! @ 2! 2@ C! C@ FILL MOVE ALIGN ALIGNED ALLOT CELL+ CELLS CHARS

The data field of a Forth word is contained in the data space.

Since code and/or word names may also be in the same physical space, the data field of each word must be thought of as discrete. The next unused data address is pointed to by the data space pointer (DSP), and returned by HERE. The smallest amount of memory that can be physically addressed is one address unit The smallest amount Forth addresses is one CHAR, which may be as small as a byte (the usual size) or as large as a cell.

A char-aligned address (c-addr) is one at which a char can be stored.
An aligned address (a-addr) is one at which a whole cell can be stored.

A few more instructions have had to be added to the pseudo-assembler:

Data Space Words - fetching and storing

 
    !     x a-addr --; store value x at a-addr 
          POP PSP TO [TOS]   POP PSP TO TOS NEXT  
 
   +!     n a-addr --; increment value at a-addr by n 
          POP PSP +TO [TOS]  POP PSP TO TOS NEXT 
 
    @     a-addr -- x ; fetch value at a-addr 
          [TOS] TO TOS NEXT 
 
    2!    x1 x2 a-addr --; store x2 at a-addr, x1 in next cell 
          POP PSP TO [TOS]   POP PSP TO [TOS CELL+]   POP PSP TO TOS NEXT  
 
    2@    a-addr -- x1 x2 ; fetch the pair at a-addr  
          [TOS CELL+] PUSH PSP   [TOS] TO TOS  NEXT 
 
    C!    char c-addr --; store char at c-addr 
          POP PSP CTO [TOS]   POP PSP TO TOS NEXT  
          if char < cell, a different instruction than TO is needed 
 
    C@    c-addr -- char ; fetch char at c-addr 
         [TOS] (char) TO TOS NEXT        
         (char) sits in the least significant bits of TOS  
         All other bits are set to zero 
 
   FILL   c-addr u char -- ; fill memory from c-addr with u chars 
          POP PSP TIMES  TOS CTO [PSP]  CHAR +TO PSP  LOOP 
          POP PSP TO W   POP PSP TO TOS NEXT 
 
   CMOVE  c-addr1 c-addr2 u -- ; move u chars from c-addr1 to c-addr2 
          POP PSP TO W   TOS TIMES   
            [PSP] (char) CTO W 
            1 +TO [PSP]  1 +TO W  LOOP 
          POP PSP TO W  POP PSP TO TOS  NEXT 
 
   CMOVE> c-addr1 c-addr2 u -- ; move u chars from c-addr1 to c-addr2 
          POP PSP TO W 
          TOS +TO [PSP] TOS +TO W      \ start from end of block 
            TOS TIMES   
          [PSP] (char) CTO W 
          -1 +TO [PSP]  -1 +TO W  LOOP 
          POP PSP TO W  POP PSP TO TOS  NEXT 
 
 
The optional String words CMOVE and CMOVE> work with chars and do not check for overlap. CMOVE starts copying at the start of the block, CMOVE> at the end. Sometimes overlap is what you want. The Core Word MOVE works with address units, not chars, and avoids overlap. In the common case where char=address unit it can be defined in high level thus:
 
 
   : MOVE  >R 2DUP SWAP R@ + WITHIN IF 
           R> CMOVE> ELSE R> CMOVE THEN ; 
    

On 16-bit Forths that address more than 64K, it has been the custom to have variants of these words which take 32-bit (two stack items) addresses. These are suffixed with L thus: @L , +!L , 2!L MOVEL , etc.

Data Space Words - alignment

The Data Space Pointer is a-aligned automatically after CREATE. Because code and data may share the same physical space, the Standard requires ; to a-align data space too. However, it is better to use ALIGN anyway, just in case.

Many systems (e.g. 8-bit and 16-bit DOS) have no alignment requirements. You may still care to define ALIGN and ALIGNED as no-ops, in case you ever want to port code. Even without portability considerations, CELL+, CELLS and CHARS are useful for readability

 
    : ALIGNED \ addr -- a-addr 
        address-units-per-cell MOD + ; 
 
    : ALIGN  HERE ALIGNED DSP ! ; 
   
    : ALLOT \ n -- ; allot n address units of data space 
        DSP +! ; 
  
    : CELL+ \ a-addr -- a-addr'; increment by one cell 
        address-units-per-cell + ; 
    
    : CELLS \ n -- n' ; the width of n cells 
        address-units-per-cell * ; 
     
    : CHAR+ \ c-addr -- c-addr' ; increment by one char 
        address-units-per-char + ;    
 
    : CHARS \ n -- n' ; the width of n chars 
        address-units-per-char * ;    
 
    : COUNT \ c-addr -- c-addr' char ;  
    \ fetch char at c-addr and increment address 
        DUP CHAR+ SWAP C@ ; 
 
 

CELL+ CELLS and CHAR+ CHARS and COUNT would normally be primitives.


Stack Words

2DROP 2DUP 2OVER 2SWAP ?DUP DROP DUP OVER ROT SWAP

Stack manipulation is a major factor in Forth code, so these words, or at least the most commonly used ones, need to be fast. As mentioned before, caching the top value on the stack in a TOS makes many primitive definitions faster. It's generally not worthwhile trying to cache more than one stack item, but with a highly optimised native code Forth it is possible to hold several stack items in registers. In that case the stack manipulating words become complex immediate words which decide at compile time exactly what code is compiled. That, however is beyond my present scope.

In these and all further pseudocode definitions, I am assuming TOS is used. Here are all the Core Standard stack manipulation words. For simplicity, both stack pointer registers are assumed to have indexing capability and both stacks to grow downwards. Neither of these assumptions is vital.

    2DROP (a b --) POP PSP TO W  POP PSP TO TOS NEXT
    2DUP  (a b -- a b a b) [PSP] TO W   TOS PUSH PSP  W PUSH PSP NEXT
    2OVER (a b c d -- a b c d a b)
          TOS PUSH PSP  [PSP 3 CELLS +] PUSH PSP 
          [PSP 3 CELLS +] TO TOS NEXT
    2SWAP (a b c d -- c d a b)
          [PSP CELL +] TO W   TOS TO [PSP CELL +] W TO TOS
          [PSP 2 CELLS +] TO W   [PSP] TO [PSP 2 CELLS +]  W TO PSP NEXT
    ?DUP  (x -- 0 | x x ) duplicate if not zero
          TOS 0<> IF TOS PUSH PSP NEXT
    
    DROP  (a -- ) POP PSP TO TOS NEXT 
    
    DUP   (a -- a a) TOS PUSH TO PSP NEXT

    OVER  (a b -- a b a) [PSP] TO W  TOS PUSH PSP  W TO TOS NEXT

    ROT   (a b c -- b c a)
      [PSP] TO W  TOS TO [PSP]
      [PSP CELL+] TO TOS  W TO [PSP CELL+] NEXT

     SWAP  (a b -- b a) POP PSP TO W  TOS PUSH PSP  W PUSH PSP NEXT


Some Standard words not in the Core:


   NIP    (a b -- b)  POP PSP TO W  NEXT

   TUCK   (a b -- b a b) POP PSP TO W  TOS PUSH PSP   W PUSH PSP NEXT

   PICK   ( xn ...x1 n -- xn...x1 xn )
          [PSP TOS 1- CELLS +] TO TOS NEXT 

   ROLL   ( xn x(n-1)...x1 n -- x(n-1)...x1 xn )


ROLL is to PICK as SWAP is to OVER. It's an ugly beast both to code and to use. Leave it out unless you have an application that needs to treat the stack like an array.

Useful non-Standard words:


   THIRD  (a b c -- a b c a) TOS PUSH PSP [PSP 2 CELLS +] TO TOS NEXT

   UNDER+  (a b c -- a+c b) TOS +TO [PSP CELL +] POP PSP TO TOS NEXT

Return Stack Words

>R R> R@

 
    >R    POP PSP  PUSH RSP NEXT 
 
    R>    POP RSP  PUSH PSP NEXT 
 
    R@    [RSP] PUSH PSP NEXT 
  

A Note on the Return Stack

Manipulating the RSP can produce some strange and sometimes useful results. For example, an R> DROP that is not balanced by a preceding >R will cause the current word when it returns, to skip the rest of the word which called it. That sort of behaviour is not portable. There is no Standard way of manipulating return addresses. It follows that R> etc. need not reference RSP at all - in fact, you may consider it safer if they don't. To avoid confusion, I shall refer to the stack used by the R* words as the "rack". These are the rules for portable use of the rack:

Your own private code may of course play fast and loose with these and any other portability rules.

Some Standard words not in the Core:

   2>R    (= SWAP >R >R)
             POP PSP PUSH RSP  TOS
             PUSH RSP  POP PSP TO TOS NEXT
   2R>    (= R> R> SWAP) 
          TOS PUSH PSP  POP RSP TO W  POP RSP PUSH PSP  W TO TOS NEXT  
 
   2R@    = 2R> 2DUP 2>R 
          TOS PUSH PSP  [RSP] TO W [RSP CELL +] PUSH PSP W TO TOS NEXT 
 

Useful non-Standard words:

 
   DUP>R   TOS PUSH RSP NEXT 
 
   R>DROP  W POP RSP NEXT 
   RPICK   ( i -- x ; i'th item from return stack) 
           [RSP TOS +] TO TOS NEXT