What is Forth
Introduction The engineer's point of view can be a handy way to understand things. If somebody sees for the first time in a life a tennis racket, a tea filter or a stethoscope, she would probably have an hard time in guessing what these objects are for. But if the same person is told in advance about their function, the design turns out to be fairly simple. Observed from the vantage point of the engineer, the Forth language looks naturally and efficiently designed to implement just a single basic idea. We will proceed explaining this idea, then moving through its immediate and most important consequences: the stack and the inverse notation. At last, we give a small coding style example. The idea Suppose you want to create a programming language. One of your first choices is how to organize data. Any program works on data in input, behaves depending on data describing its internal status, and provides other data as output. Most programming languages are variable-oriented, that is, symbolic names like A, B, C, etc. are used for storing and retrieving data in each of these phases. To add two numbers in a variable-oriented language, we store the first one into a variable A, the second into B, then apply the operator "+" to A and B. At the end of the operation, we expect to have A and B unchanged, plus the sum output, usually assigned to a third variable C. Symbolic names are just a possible way (the most popular, in fact) to identify memory registers that contain the numbers we are interested in. An alternative way, the way of Forth, is to base your data representation on a completely different, but equally valid principle: a buffer. Data are collected into a common memory space, used as input, temporary space for operations, and output. Variables, in principle, are not needed at all. The addition between two numbers is achieved by putting them onto the buffer, then applying the "+" operator to the buffer, finding the result, again, on the buffer. The idea in itself can lead to effective and clean programming in some scenarios, but can be very messy in others. We will not discuss here if variables are better than buffers or vice-versa. The only point we want to make is the following: if we decide to go for a buffer-oriented language, most of its basic characteristics will follow quite straightforward from this initial choice. The stack Moving to a buffercentric world, we have to decide what kind of buffer we want. The buffer contains all our data, and we would like to access them efficiently, that is, with a minimum handling effort. Let's try to see how we typically access data in real life. When we calculate a function (which is, ultimately, a description of what a computer program does), we usually don't prepare a lot of data in anticipation, for accessing them all and get the result in a single big leap. Instead, what we do is quite the opposite: we start from a few inputs and go through a sequence of small elaboration steps, each of them using temporary results obtained from the previous one. Branches are possible, but they are somehow the exception: most of what we do is a top-down chain of computations, progressively transforming our input into the desired output. Also note that usually all of the intermediate results will be thrown away in the end because useless. |

To rephrase this concept, we can say that in most algorithms, functions to be computed are expressed as composition of simpler functions:

F(x)=G(H(x))

That is, we apply H to transform x into y, then apply G to transform y into z, throwing away y. The kind of buffer that better supports this mechanism is a LIFO (Last-In, First-Out), or stack. In any elaboration step, it makes available on top the last inserted number, ready to be used and automatically destroyed after use, so that the process will terminate with the result on top, in place of the initial input. This remains true if we have other data present on the stack before x, maybe from previous calculations. The key point is that since the latest data are the first made available, to use them we don't need to remember where they are, by setting up whatever position indexing.

If we used a FIFO (First-In, First-Out) or queue, the situation would look quite different. Our intermediate result y would be accessible to G only when all the other numbers that might be already present in the buffer, have been consumed. Which means that if we need y just after we calculate it, we have to know how long is the queue in order to locate and retrieve it.

Obviously, if we have only one number stored, there's no difference between FIFO and LIFO, but in real life this is not a typical situation.

The inverse notation

For telling a program that a function F shall be applied on a given input and the result stored, the familiar notation B=F(A) comes very handy, and is in fact adopted by most programming languages. But it requires variables, therefore it can't fit a stack-oriented language.

It is worth noticing that defining variables in Forth is actually possible, but it is against the whole concept of using a stack, therefore we will not consider this option.

At first glance, we could think the stack itself like a single big variable, an array S, and try to use the usual notation. If we assign increasing indexes to the elements of S from top to bottom (S1, ... , Sn), the application of a function F will produce a new value S0 on top, depending in general on each of the other stack elements:

S0=F(S1, ... , Sn)

But then, what's the point of having a stack? This notation is heavy, redundant (why writing the symbol S? To distinguish it from what?), not to mention the problem of destroying the elements actually used, and shifting the indexes of the remaining ones.

Searching for a simpler convention, let's proceed step by step by following the stack storing and retrieving sequence: this will suggest the notation itself. The process to calculate F on a given input a (the number a, not a variable symbol A) is:

The numeric result, say b, is then left on top of the stack. The parenthesis have been used to highlight that we don't need to refer the stack every time: in a stack-oriented language, all dataflow is by default from and to the stack. Summarizing, all we need to communicate to the interpreter in order to calculate F(a) is

a

F

or, on a single line

a F

A Forth line is in fact a dynamic description of a stack dataflow. Here it comes the needing for inverse notation.

The inverse notation applies easily to function composition. The expression

G(H(x))

turns out tox H G

which does not mean to reverse everything, but just to write operators in the order they are applied. For instance, the operation 5-(6/2) will be expressed by

which means: store 5, 6 and 2, then apply the operator "/", (which will act on the top values, leaving on the stack 5 and 3), and finally apply the operator "-", (which will act on 5 and 3, leaving on the stack the result 2 alone).

Note that, besides a natural reflection of the stack dataflow, the inverse notation also solves in an elegant way the problem of operator priorities, elsewhere solved by additional symbols like parenthesis, or by specific conventions.

A worked example

A major obstacle to effective Forth programming is to use it as a variable-oriented language, that is, to think the stack only as a temporary space to store and retrieve variables. It works, but that’s not the way this language should be used, and here we can see why.

Let's consider the function to compute a given term of the Fibonacci sequence. It takes an index as input and returns the corresponding term of the sequence as output.

This is how it might be coded by a variable-oriented programmer:

VARIABLE N \ Some VARIABLE A \ variables VARIABLE B \ shall be VARIABLE C \ declared : Fibonacci ( index -- value ) N ! \ Move index from stack to N 1 A ! \ Assign initial values 1 B ! \ to A and B N @ 0 DO \ Repeat N times A @ B @ + C ! \ C=A+B B @ A ! \ Copy B into A C @ B ! \ Copy C into B LOOP

C @ ; \ Stack the output

and here is the same function, coded without variables:

: Fibonacci ( index -- value ) 1 1 ROT \ Stack the initial values 0 DO \ Repeat index times TUCK + \ Copy from low stack and add LOOP NIP ; \ Discard low stack

To state the obvious, the second implementation is much more compact, readable, fast, and memory cheap. The main point of the example is just this: use the stack. Switching from variables to stack requires only the little effort of familiarizing with a few manipulators (like ROT, TUCK and NIP) which are the most precious worktools in Forth.