Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts

    Intel’s tough decision boosted AMD to record highs

    Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      How Polymarket Is Turning Bitcoin Volatility Into a Five-Minute Betting Market

      February 13, 2026

      Israel Indicts Two Over Secret Bets on Military Operations via Polymarket

      February 13, 2026

      Binance’s October 10 Defense at Consensus Hong Kong Falls Flat

      February 13, 2026

      Argentina Congress Strips Workers’ Right to Choose Digital Wallet Deposits

      February 13, 2026

      Monero Price Breakdown Begins? Dip Buyers Now Fight XMR’s Drop to $135

      February 13, 2026
    • Technology

      This MacBook Pro has a Touch Bar and is only $410 while stock lasts

      February 13, 2026

      Intel’s tough decision boosted AMD to record highs

      February 13, 2026

      Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

      February 13, 2026

      Microsoft Store goes zero-clutter—through the command line

      February 13, 2026

      How Boll & Branch leverages AI for operational and creative tasks

      February 13, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Forth: The programming language that writes itself
    Technology

    Forth: The programming language that writes itself

    TechAiVerseBy TechAiVerseOctober 20, 2025No Comments24 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Forth: The programming language that writes itself
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Forth: The programming language that writes itself

    Charles H. Moore and the pursuit of simplicity.

    Author: Dave Gauer
    Created: 2023-02-02
    Updated: 2024-12-22

    Note: This page is my personal journey to discover Forth
    and put it in the context of computing history.
    It is adapted from my
    slides for a short talk.
    I’ve done everything in my power to make this page scale up and down
    for various screen sizes. I welcome suggestions and corrections for
    both the content and display of this page.
    Here’s my
    contact page.

    The Legend

    When I was a wee programmer, I would sit around the virtual Usenet campfires listening
    to the tall tales and legends of the elders.

    In the 1990s, Usenet
    newsgroups
    (wikipedia.org)
    were where it was at.
    For example, Linus Torvalds’s initial announcement of Linux was to
    comp.os.minix in 1991.

    The comp.*
    (wikipedia.org)
    groups and particularly comp.lang.* were great
    places to learn about and discuss programming.
    By the time I got there in the late 90s, Perl was a pretty hot topic,
    especially as it took a dominant role in the early Web as the
    dynamic page and form processing programming language via
    CGI
    (wikipedia.org).

    There were programming resources on the Web, but nothing like what’s
    available now!
    To actually learn to program, I bought books,
    and still do.

    Usenet was where the community and folklore lived.

    (The “Easter egg” in this drawing is alt.religion.kibology, which should
    get a chuckle from old timers. The rest of you can look it up.)

    I learned about magical languages with lots of (((((parenthesis))))).

    Sharp-eyed Lisp-lovers and other mutants will perhaps recognize this thing
    as the Y combinator expressed with lambdas.

    The only time I understood this was when I completed
    the book The Little Schemer by Friedman and Felliesen, which
    walks you through creating it for yourself. It is a magical book and
    I implore you to try it.

    I listened, wide-eyed, to true tech tales like The Story of Mel (foldoc.org).

    Mel was real and the Royal McBee RPC-4000 was real. Look at that teletype
    (aka “teleprinter”). If typewriters and “Royal” together make a little bell
    in your head go “bing” as your mental carriage hits the end of the page,
    then you’re right: Royal McBee was a merger between the
    Royal
    Typewriter Company
    (wikipedia.org) and McBee, a manufacturer of accounting machines.

    For a while, Royal was owned by the Italian typewriter company, Olivetti,
    who also made some really interesting computers (wikipedia.org).

    And then…

    I heard tell of a programming language so flexible that you could
    change the values of integers.

    They said that language was called Forth and it was created
    by a mad wizard called Chuck Moore who could write any program in
    a couple screens of code.

    Years went by and I wrote a lot of PHP and JavaScript.
    I watched the Web evolve (and sometimes de-evolve).

    But I never forgot about the legend of Forth.

    The blog series
    “programming in the twenty-first century”
    (prog21.dadgum.com)
    by game developer James Hague gave me the final push.

    He made Forth a recurring theme and it just sounded so darned interesting.

    So I went on an adventure and now that I have returned, I think I have some
    answers.

    (Oh, and I confirmed the legend. I can make any integer
    equal anything I want. Stick around ’til the end to see that Forth magic
    trick.)


    “Voilà!”

    Forth uses postfix (RPN) notation

    At first, I thought this was what Forth was all about:

    3 4 +
    7
        

    Now begins my quest to understand Forth.

    Perhaps you’ve seen postfix or
    Reverse Polish Notation (RPN)
    (wikipedia.org)
    before? The principle is simple: Instead of the usual “infix” notation
    which puts operators between operands (3 + 4), RPN puts
    operators after the operands ( 3 4 +).

    RPN notation is one of the most visually obvious
    aspects of the Forth programming language. But it turns out, RPN is
    not what Forth is about or the reason Forth exists.
    As we’ll see, the situation is reversed.

    In fact, as you’ll see, my quest is mostly a series of incorrect
    assumptions I made by looking at the language without the context
    of history
    .

    By the way, the HP-35 calculator (wikipedia.org) pictured here is really interesting.
    In the early 1970s, HP had powerful desktop calculators.
    Actually,
    what they had were really programmable computers, but they still
    called them calculators (wikipedia.org) for sales reasons.
    But these were big desktop machines that ran off of wall current.

    Putting all of that power into a “shirt pocket” calculator was
    an astounding accomplishment at the time.
    Legend has it that the
    size of the HP-35 was based on the dimensions of Bill Hewlett’s
    actual shirt pocket.
    HP-35 calculators have been in space. They killed off the slide rule.

    HP calculators are famous for using RPN syntax. If it weren’t for
    these calculators, I suspect it’s likely that RPN syntax would be
    virtually unknown outside of computer science.

    RPN is considered to be highly efficient and,
    being somewhat inscrutable to outsiders, highly geeky.

    Let’s see a better example…

    Noob:

    $ bc
    bc 1.07.1
    Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006,
    2008, 2012-2017 Free Software Foundation, Inc.
    This is free software with ABSOLUTELY NO WARRANTY.
    For details type `warranty'.
    (3 * 4) + (5 * 6)
    42
        

    Pro:

    $ dc
    3 4 * 5 6 * + p
    42
        

    I’m being cheeky here. Users of bc, are hardly
    noobs. But it is arguably even geekier to use the much
    older dc program. bc was once just an
    infix expression translator for dc to make it more
    palatable for people who didn’t want to use RPN. Thus the gentle
    teasing.

    Besides using RPN syntax,
    the dc calculator
    (wikipedia.org) is completely programmable. Oh and it also happens to
    be one of the very first Unix programs and pre-dates the C programming
    language!

    Anyway, the point here is that RPN syntax lets you express
    nested expressions without requiring parenthesis to get the order of
    operations the way you want them. This is one of the reasons RPN fans
    (including those HP calculator fans I alluded to) are so enamoured with it.

    In this example, we input 3, then 4. * multiplies them.
    Now we have the result (12) available. But first, we input 5 and 6 and
    multiply them with another * to also store that result (30).
    The final + adds both stored results (12 + 30) and
    stores that result (42).
    Unlike an HP calculator, dc doesn’t show us any of the
    stored results, including the last one until we “print” it with the
    p command.

    As it is known about
    “ed, the standard text editor”
    (gnu.org), dc doesn’t
    waste your VALUABLE time (or teletype paper) with output you don’t need!

    So this relates to Forth how?

    Forth pro:

    3 4 * 5 6 * + .
    42
        

    As you can see, someone sitting at a Forth interpreter
    can perform this calculation exactly the same as with the dc
    calculator (or an HP calculator).

    Sharp-eyed readers will note that we print the result with a “.”
    command rather than “p”. But that’s the only difference.

    So Forth is like an RPN calculator? We input values and then
    operate on them?
    Well, that statement is not wrong

    But does that mean we know what Forth is all about now?
    If we know how to enter things in postfix notation, we “get” Forth?
    No! Not even close…

    Forth absolutely uses postfix notation.

    But then I learned some more:

    Forth is stack-based

    The use of a data stack is probably the second most visible thing
    about the Forth programming language.

    A stack is a data structure often explained with a “stack of
    plates” analogy. You PUSH a plate on the stack and you POP
    a plate off the stack. The first item you put on the stack is
    the last item out of the stack.

    Above, we have an illustration of PUSH and two other common
    stack operations:

    • SWAP slides a plate out (very carefully) from the second
      position and puts it on top.
    • DUP takes the top plate and duplicates it using
      kitchen magic and puts the replica on the top of the stack (in
      this metaphor, I guess an equal amount of matter is removed
      somewhere else in the Universe, but we try not to worry too
      much about that).

    As you may have guessed, these four stack words (PUSH, POP,
    SWAP, DUP) also happen to be Forth words.

    Historical note 1: In the old days, people and computers just
    WENT ABOUT SHOUTING AT EACH OTHER ALL THE TIME IN ALL CAPS BECAUSE
    LOWERCASE LETTERS WERE TOO EXPENSIVE.

    Historical note 2: When a computer asks, “SHALL WE PLAY A
    GAME?” in all caps, you must answer NO, as we learned in 1983’s
    WarGames (wikipedia.org)

    Let’s see a stack in action:

    Op   The Stack
    --   ---------
    3     3
    4     3  4
    *     12
    5     12 5
    6     12 5  6
    *     12 30
    +     42
    .
        

    Let’s revisit our math problem from earlier. This is the
    Forth code on the left and the results on “the stack” on the right.

    Rather than being concerned with the syntax or notation, we’re
    now interested in what these operations are doing with our data
    stack.

    As you can see, entering a number puts it on the stack.
    The math operators take two values from the stack, do something
    with them, and put a new value back on the stack.

    The ‘.’ (DOT) operator is different since it only takes one
    value (to print it) and does not put anything back on the stack.
    As far as the stack is concerned, it is equivalent to DROP.
    As far as humans are concerned, it has the useful side-effect
    of letting us see the number.

    Now let’s see something you probably wouldn’t find
    on an HP calculator. Something non-numerical…

    This is valid Forth, assuming CAKE, HAVE, and EAT have been defined:

    CAKE DUP HAVE EAT 
        

    Getting the joke here will require knowing
    this English idiom (wikipedia.org).

    Actually, this isn’t just a silly example.
    Forth’s use of the stack can lead to a natural, if somewhat
    backward use of nouns and verbs. (Kind of like Yoda’s speech habits.
    “Cake you will dup, yes? Have it and eat it you will, hmmm?”)

    There can, indeed, be some object named CAKE that we have
    placed on the stack (probably a memory reference) which
    can be DUPed, and then HAVEd and EATen.

    It’s up to the Forth developer to make harmonious
    word choices. It can get far more clever or poetic than my example.

    Naming things is great.

    But sometimes not naming things is even better.

    The stack frees us from being forced to create explicit names for
    intermediate values.

    If I ask you to add these numbers:

    2 6 1 3 7
        

    Do you feel a need to give a name to each sum pair…or even the running total?

    (Hopefully your answer is “no” or the rhetorical question doesn’t work.)

    But it’s funny how our programming languages often require us
    to explicitly name intermediate results so that we can refer to them.
    On paper, we would never give these values names – we would just happily
    start working on the list.

    Imagine, if you will, a factory assembly line in which
    each person working the line is a hateful fussbudget who refuses to
    work on the part in front of them until you name it. And each time the
    part has been worked on it must be given a new name. Furthermore, they
    refuse to let you re-use a name you’ve already used.

    A lot of imperative languages are like that factory. As your
    values go down the line, you’ve got to come up with nonsense names
    like result2, or matched_part3.

    Does your programming language make you do this?

    (It’s almost as bad as file names used as a versioning
    system: my_doc_new_v5.4(copy)-final2…)

    Working without names (also known as implicit or
    tacit or point-free programming) is sometimes a more
    natural and less irritating way to compute. Getting rid of names can
    also lead to much more concise code. And less code is good code.

    Great, so stacks can be a very elegant way to handle expressions.

    Have we “cracked” Forth yet? Now we know two things:
    it uses RPN syntax and it is stack-based.

    Well, Forth certainly does use a stack. It is definitely a stack-based
    language.

    But then I learned some more…

    Concatenative programming

    Ah, this must be it because it sounds fancy.

    On this journey of Forth discovery, you’ll inevitably run into
    the term “concatenative programming”.

    What’s that?

    An awesome resource for all things concatenative is
    The Concatenative Language Wiki
    (concatenative.org).
    It lists many concatenative languages and has a page about Forth,
    of course.

    For the term “concatenative programming” itself, the Factor
    programming language website has an excellent page defining the
    term:
    Factor documentation: Concatenative Languages
    (factorcode.org).
    And, of course, there’s the Wikipedia entry,
    Concatenative programming language
    (wikipedia.org).

    I understand the explanations on these websites now, but
    it took me a while to get there. Your journey may be shorter or longer.
    Probably shorter.

    Let’s see if I can stumble through it…

    Contrast with applicative language:

    eat(bake(prove(mix(ingredients))))
        

    Concatenative language:

    ingredients mix prove bake eat
        

    An applicative language has you apply a function to a value, which
    returns another value. Using familiar Algol-like (or “C-like”, or
    “Java-like”, or “JavaScript-like”) syntax, arguments are passed to
    functions within a pair of parenthesis. In the above example, the
    parenthesis end up deeply nested as we pass the output of one function
    to another.

    Unlike the math examples, where the infix notation looks more
    natural to most of us than the postfix notation, the concatenative
    example of this baking program looks more natural (at least in a
    human language sense) than the inside-out function application
    example, right?

    (Of course, if you’re a programmer used to years of something like C
    or Java or JavaScript, the inside-out parenthetical form will probably
    seem pretty natural too. Well, guess what? Your mind has been
    warped. It’s okay, mine has too.)

    The point here is that concatenative style has us “composing”
    functions (which you can think of as verbs) simply by putting them
    in sequence. Each function will be called in that sequence.
    The values that are produced at each step are passed along
    to be consumed as needed.

    No names (unless we want them), just nouns and verbs.

    But that’s just the surface. It turns out this “concatenative language”
    concept goes way past that…

    The canonical example of a concatenative language is Joy.

    Joy

    Manfred von Thun inspired by Backus’s 1977 ACM Turing Award lecture:

    Can Programming Be Liberated from the von Neumann Style? (PDF) (worrydream.com)
    This paper is dense with notation and I haven’t personally
    attempted to wade through it, yet. I’m sure it contains
    many profound ideas.

    I know just enough to believe I understand this paragraph from
    the paper’s abstract:

    “An alternative functional style of programming is
    founded on the use of combining forms for creating
    programs. Functional programs deal with structured
    data, are often nonrepetitive and nonrecursive, are
    hierarchically constructed, do not name their
    arguments, and do not require the complex machinery of
    procedure declarations to become generally applicable.
    Combining forms can use high level programs to build
    still higher level ones in a style not possible in
    conventional languages.”

    Perhaps you’ve heard of “functional programming?” As you can
    see, that term was being used in 1977.

    “Concatenative programming” came after. In fact,
    Joy is where the “concatenative” description comes from!
    (von Thun specifically credits Billy Tanksley for creating the term
    “concatenative notation”.)

    Joy is kind of like starting with a Lisp

    …without variables

    …and without traditional control structures

    …and all functions are unary (or an “arity of 1”).

    Specifically, all functions take one stack as input and return
    one stack as output. The stack is not named, it is implied.

    A program is simply a list of functions that is read
    from left to right.

    I can’t describe Joy’s genesis better than the man himself.
    Here is von Thun in an interview about Joy:

    “Joy then evolved from this in an entirely haphazard way:
    First I restricted the binary relations to unary functions, and
    this of course was a dramatic change. Second, to allow the usual
    arithmetic operations with their two arguments, I needed a place
    from which the arguments were to come and where the result was to
    be put – and the obvious place was a stack with a few shuffling
    combinators, originally the four inspired by Quine. Third, it
    became obvious that all these combinators could be replaced by
    unary functions, with only function composition remaining. Finally
    the very different distinctively Joy combinators emerged, which
    take one or more quoted programs from the stack and execute them in
    a specific way. Along the way of course, lists had already been
    seen as just special cases of quoted programs. This meant that
    programs could be constructed using list operations and then passed
    on to a Joy combinator.”

    From A Conversation with Manfred von Thun (nsl.com), which is a really great read in its entirety.

    As you can see, combinators are crucial in Joy.
    Let’s take a moment to dive into those, because this is a pretty
    fascinating avenue of computer science…

    Combinators

    Combinators are any “higher-order” functions like map.

    “Higher-order” just means functions that take other
    functions as input and do things with them.

    You can even have functions that take functions that take functions
    and so on to do powerful things. But you’ll need to meditate on
    them every time you have to re-read that part of your code.

    map is one of the more common examples, so I’ll use it
    as an example.

    JavaScript:

    inc = function(n){ return n + 1; };
    
    bigger = [1, 2, 3, 4].map(inc);
    
    Result: [2,3,4,5]
        

    JavaScript using an “arrow function”:

    bigger = [1, 2, 3, 4].map(n => n + 1);
    
    Result: [2,3,4,5]
        

    (The second example with the arrow function syntax works exactly
    the same way, but more compactly. I included it to make the comparison
    with Joy a little more even-handed. Feel free to pick a favorite
    and ignore the other one.)

    In the first example, we have familiar Algol-like
    syntax with functions that take arguments in parenthesis.

    Perhaps
    map() is familiar to you. But if not, just know that
    it takes two parameters like so: map(array, function).
    The first parameter is implicit in these JavaScript examples, but it’s
    there. The array object, [1, 2, 3, 4] calls its own
    map() method. The second parameter is a function
    (named inc in the first example and left anonymous in
    the second), which will be applied to every member of the list.

    The output of map() is a new list containing the
    result of each application.

    Notice how both JavaScript examples
    have variables such as the parameter n and the result
    bigger. This is an example of what I mentioned a moment
    ago when discussing the advantages of stacks: “Traditional”
    programming languages often make us name values before we can work with
    them.

    The same thing, but concatenatively in Joy:

    [1 2 3 4] [1 +] map
    
    Result: [2 3 4 5]
        

    The syntax here may require a little explanation.
    The square brackets ([]) are Joy’s
    quote mechanism. Quotations are a lot like lists, but they can contain
    programs as well as data.

    In this case, the first quotation is the number list,
    [1 2 3 4].

    The second quotation is a program, [1 +].

    As in the JavaScript examples, map takes two parameters.
    The first is the function (or “program” in Joy) to apply, and the second
    is the list to apply it to.

    (It’s kind of confusing to talk about “first” and “second,” though
    because that’s the opposite order in which we supply those
    arguments on the stack…)

    Note the lack of variables bigger or n.
    Intermediate values just exist.

    It looks pretty nice and neat, right?

    This “point-free” style can be a blessing…
    or curse. Unlike computers, human brains have a hard time juggling too
    many things on the stack.

    There seems to be a happy medium between named and unnamed. Also,
    the point-free style seems to benefit greatly from short (even
    very short) definitions to avoid mental juggling and greater
    composibility.

    If you have the slightest interest in Joy, I highly recommend
    reading or skimming this delightful tutorial by Manfred von Thun
    himself:
    An informal tutorial on Joy
    (hypercubed.github.io).

    Note: I had a bit of a time actually running Joy to test out these
    examples. Thankfully, I eventually ran into
    Joypy (github.com),
    a Joy written in Python. My Linux distro comes with Python installed,
    so the whole process for me was:

    git clone https://github.com/calroc/joypy.git
    cd joypy
    python -m joy
    ...
    joy? [1 2 3] [1 +] map
            

    Okay, that’s a glimpse.

    But we’ve barely touched the conceptual power of combinators with our
    map examples. Let’s go a little deeper on
    this fascinating subject:

    Here’s something from my bookshelf. It’s To Mock a Mockingbird
    by mathematician and
    puzzle-maker Raymond Smullyan. It uses puzzles involving birds to solve
    logic problems and classify some well-known combinators.

    It would be impossible to write a complete catalog of
    combinators just as it would be impossible to write a complete
    catalog of integers. They’re both infinite lists.
    Nevertheless, some well-known combinators have been identified as
    having special properties. In the book above, many of these have
    been given the names of birds.

    Remember, combinators are just “higher-order”
    functions that take functions as input.
    Well, it turns out these are
    all you need to perform any computation. They can replace logical
    operators and even variables.

    What?!

    Yeah, you can re-work any expression into a combinatorial expression
    and completely replace everything, including the variables, with
    combinators.

    It’s kind of hard to imagine at first. But you can see it happen
    right before your very eyes.
    The mind-blowing tool on this page by Ben Lynn:
    Combinatory Logic
    (stanford.edu)
    takes a term expressed in lambda calculus and replaces everything
    with just two combinators, K and S.
    (We’ll talk more about those two in just a moment because they
    are super special.)

    (Ben Lynn’s whole website is full of neat stuff like this.
    If you’re looking to entertain yourself for any amount of time from
    an afternoon to the rest your life, Lynn has you covered.)

    So combinators share something in common with lambda calculus and
    Turing machines. These systems provide all of the building blocks
    you need to perform any
    possible computation in the sense of the
    Church-Turing thesis (wikipedia.org)
    or “computability thesis”. (We’ve also discovered some problems
    that are not computable and no system can compute
    them like “the halting problem,” but these are pretty rare.)

    It turns out that computation is a fundamental feature of
    the Universe
    .
    As far as we can tell, any universal system of computation is equally
    capable of solving any computational problem. And once you realize how
    little is required, you can invent a universal computer yourself!

    Electronically speaking, this is the same principle
    that allows a NAND gate to simulate all other gates. NAND gates are
    a fundamental computational building block. You can make an
    entire computer with nothing but NAND gates and that computer can
    (slowly) solve any computable problem you can imagine.

    Anyway, when we use combinators, this particular flavor of universal
    computation is called
    combinatory logic (wikipedia.org).

    What do the building blocks of combinatory logic look like?

    Let’s start small:

    Identity

    (I x) = x
        

    The simplest of all combinators is I, the “identity combinator”.
    There are a ton of different ways to write this. In lambda calculus,
    it looks like this: I = λx.

    The way to read "(I x) = x" is: “I applied
    to some object x results in…x.”

    We say “object x” rather than “value x” because, being a
    combinator, I could take a function as input as well as a
    value. In fact, “object” is intentionally very abstract, so
    x could contain a scalar value, or
    list, or function, or another combinator, or anything.
    Whatever that object is, I returns it.

    K and S

    (K x y) = x
    
    (S x y z) = (x z (y z))
        

    Both of these take more than one parameter of input.
    But if you’re used to Algol-like function syntax, the way this
    works may be surprising.

    Since it’s the simpler of the two, let’s use the K
    combinator as an example:

    The way to read “(K x y) = x” is:
    “K applied to x yields
    a combinator
    , which, when applied to y always
    evaluates to x.”

    (Programmers familiar with the concept of currying will see
    that this is like the partial application of a function, where
    a new function is “pre-baked” with the argument x. The
    term “currying” is named in honor of mathematician
    Haskell Curry
    (wikipedia.org),
    after whom the Haskell programming language is also named.)

    The result is that K makes a combinator that
    throws away any input and just returns
    x. Weird, right? But it turns out to be useful.

    K is super easy to write in a language like
    JavaScript, which is also a nice choice because you can play with
    it right in the browser console like I just did:

    K = function(x){
      return function(y){
        return x;
      }
    }
    
    K("hello")("bye")
    
    > "hello" 
            

    See how the result of K("hello") is a function that
    returns “hello” no matter what you give it as input?

    How about S? I’ll leave implementing that
    in JavaScript as an exercise for the reader.
    It’s clearly much more complicated since it has three levels of
    “function that yields a combinator” on the left and the result
    is an equally complicated combinator that first applies
    parameter z to combinator y.

    (By the way, the y combinator above should not be
    confused with the Y combinator.
    Do you remember that arcane lambda calculus artifact projected
    over that head with the third eye way up near the beginning of this
    page? That thing was the Y combinator! It turns out, it’s
    all, like, connected, you know?)

    But the real point is this: S and K are
    special for one very interesting reason.
    Together with I, they form the “SKI calculus” and just
    these three combinators are all you need to perform
    any computation in the known universe.

    Actually, it’s even crazier than that. You don’t even need
    I because that, too, can be created with S
    and K.

    That’s right, the S and K definitions
    above are a complete system for universal computation.

    The book shown here is another from my bookshelf. It’s
    Combinators: A Centennial View by Stephen Wolfram.

    It starts with a (much too) terse introduction to the SKI combinator
    calculus and then launches into page after page of visualizations of S
    and K combinators being fed into each other. Like fractals or automata,
    simple inputs can produce patterns of surprising sophistication.

    Wolfram demonstrates combinators that keep producing different
    output for a gazillion iterations and then get stuck in a loop. Some of
    them produce regular patterns for a while and then start producing
    different patterns. Some just loop forever at the outset.
    As in other universal systems, there is no end to the complexity
    produced by these two simple constructs. It is infinite. And all of
    this is just S and K combinators taking combinators as input and
    returning combinators as output.

    I think it is wild and fun to see someone play
    with a subject like Wolfram does in this book. Each page is saying,
    “Look at what is possible!”

    Combinators is also Wolfram’s ode to the discoverer of
    combinatory logic,
    Moses Schönfinkel (wikipedia.org)
    who, like so many of the giants in the field of computer science,
    did his work on paper decades before the first digital electronic
    computers beeped their first boops.

    Figuring out the output of the S combinator once
    was enough to keep me occupied for a while. It boggles my mind to
    imagine feeding it another S as input on paper,
    let alone discovering these particular combinators in the first place.

    Okay, we get it, combinators are a crazy way to compute.

    But are they worth using in “real” programs? In limited
    doses, absolutely!

    Combinators let us factor out explicit loops. This:

    foo.map(bar)
        

    is the same as this much longer statement:

    temp = [];
    for(i=0; i<foo.length; i++){
        temp[i] = bar(foo[i]);
    }
        

    Both of those pieces of JavaScript give us the result of applying
    the function bar() to an array foo.

    I think map() is a great example of the power of
    combinators to clean up a program with abstraction. Once you start
    using simple combinators like this to abstract away the boilerplate
    logic of yet another loop over a list of items, it’s hard
    to go back.

    My personal history with exploring higher order functions in
    a production setting is through the
    Ramda (ramdajs.com) JavaScript
    library, which I discovered from the talk
    Hey Underscore, You’re Doing It Wrong!
    (youtube.com)
    by Brian Lonsdorf, which is fantastic.

    Once I started discovering how combinators and curried functions
    could eliminate big old chunks of code, I was hooked!
    The old, dreary procedural code became a new fun puzzle!

    Mind you, it’s very easy to go overboard with this stuff and
    write something far less readable
    than some simple procedural code. (Gee, ask me how I know this.)

    But in limited doses, it’s super powerful and compact.

    Joy uses combinators to “factor out” all sorts of logic.

    Even different forms of recursion can be completely handled
    for you by combinators in Joy thanks to the uniformly unary functions.

    Here’s a factorial definition:

    factorial == [null] [succ] [dup pred] [*] linrec
        

    Let’s try it:

    5 factorial
    120
        

    Computing the factorial of a number is often used as an example of
    recursion. The final answer is the input number multiplied by the
    previous number multiplied by the previous number multiplied by…
    the rest of the numbers all the way down to 1.

    Computing a factorial requires a cumulative result. Without
    recursion, you need an explicit variable to hold the intermediate
    result as you loop through the numbers.

    As shown in the Joy factorial definition above,
    linrec is a “linear recursion” combinator. It takes takes
    4 parameters, each of which is a quoted program. null is a
    predicate which tests for zero. dup is the same as in
    Forth. pred is an operator which yields a number’s
    predecessor (given 4, yields 3). “*” multiplies two
    numbers, just like you’d expect. Given these pieces, perhaps you can
    take a guess at how linrec works?

    For comparison, here is a recursive JavaScript solution:

    function factorial(n) {
        if (n <= 1) {
            return 1;
        }
    
        return n * factorial(n - 1);
     }
            

    Note that the Joy example is not just shorter and has no
    variable names but it has abstracted away the mechanics
    of recursion
    . All we're left with is the
    logic specific to the factorial problem itself.

    It's debatable which of these two are more readable
    because the measure of readability is in the eye of the beholder.
    But I think you can imagine getting good at reading the Joy
    example.

    Okay, so we've gone pretty deep into this concatenative
    programming and combinator thing. How does this actually
    relate to Forth?

    First of all, Forth does have facilities for
    dealing with combinators:

    Forth supports higher order functions with "execution tokens"
    (function pointers) and the EXECUTE word.

    This will run the word returned by the word FOO:

    FOO EXECUTE
        

    With this, you can very compactly define combinatorial words such as
    MAP, FOLD, and
    REDUCE.

    First, let's see how EXECUTE works. The syntax will be
    alien to non-Forth programmers, but the concept will be no problem for
    anyone used to using first class functions.

    First, let's make a new word:

    : hello ." Hello" ;
            

    This is Forth for, "Compile a word called hello
    that prints the string Hello."

    (We'll learn how compiling words actually works later.
    For now, please just gracefully accept what you're seeing.)

    Next:

     
    VARIABLE hello-token
            

    This creates a new variable called hello-token which
    will store the "execution token" for the hello word.

    This part will look super cryptic if you're new to Forth:

     
    ' hello hello-token !
            

    Let's examine this one piece at a time:

    • "'" gets the address of the word
      "hello" and puts it on the stack.
    • "hello-token" is a variable, which
      just leaves its address on the stack when called.
    • "!" stores a value from the stack
      (the address of hello) at
      an address from the stack (the address of
      variable hello-token).

    So the code above simply reads, "Store the address of
    hello in the variable hello-token."

    Now let's use EXECUTE to call this "execution token":

     
    hello-token @ EXECUTE
    Hello
            

    Behold, it printed the "Hello" string!

    Remember, the variable hello-token leaves its
    address on the stack when it is called.

    "@" is a standard Forth word that loads the value
    from the given address and puts that value on the stack.

    EXECUTE gets an address from the stack and runs
    whatever word is found at that address.

    Perhaps it would be helpful to see that this silly statement:

    ' hello EXECUTE
            

    is equivalent to just calling hello directly:

    hello
            

    Anyway, now we're armed with Forth's combinatorial ability:
    Treating functions ("words") as values so other functions can
    take them as input. This allows us to define combinators in Forth.

    For some compact higher-order function definitions
    in Forth, check out this Gist by Adolfo Perez Alvarez (github.com).

    So yes, Forth is concatenative. It implicitly passes values
    from one function invocation to the next. And it supports higher-order
    functions.

    Nevertheless, I do not believe studying "concatenative
    programming" in general or Joy specifically is a good way to understand
    the history and genesis of Forth!

    For example, this simple statement:

    2 3 +
        

    can be read two different ways:

    Forth: "Push 2 and then 3 on the stack; add them; push result
    5
    on the stack."

    Joy: "The composition of the functions 2, 3, and +
    is identical to the function 5."

    While both languages share a cosmetically similar syntax,
    and both produce the same result for this
    expression, there is a fundamental difference between how the two
    languages "think" about the expression because they arrived at
    this place in completely different ways.

    Forth's only concern (as a language) is to process these three
    tokens and act upon them according to some simple rules.
    (If the token is in the dictionary, execute it. If it's a number, put
    it on the stack.)

    To Joy, it may be the same mechanical process under the hood, but
    the language itself sees these tokens more like a mathematical
    expression. It's a much more abstract outlook.

    The point I'm making is that Forth may accomodate the
    abstract point of view, if the developer chooses to take it. But
    Forth is not based on abstract concatenative computing
    principles or combinatory logic.

    Let's look at this from a historical perspective.
    First, the notions of postfix syntax (RPN) and a data stack for
    the basis of the language:

    Postfix notation was definitely in the air when Chuck Moore
    created Forth.

    Stacks were known and used in the time of Forth's origins,
    though they were generally limited to 2-4 items in registers.

    So I think it's reasonable to assume that RPN syntax and use of
    stacks are a historically accurate way to examine Forth's "origin story."

    Hold that thought, here's a fun aside:

    The drawing of the computer labeled "Z3" on the right is of
    the
    Z3 computer
    (wikipedia.org)
    designed by engineer and computer scientist Konrad Zuse. This is widely
    considered to be the first programmable digital computer!
    It used electro-mechanical relays like the telegraph networks of the day.

    (By the way, a certain amount of electro-mechanical logic is
    still used in modern nuclear reactor safety systems because
    the big mechanical components are not as vulnerable to nuclear
    radiation as semiconductors!)

    The Z3 could do addition in less than a second and multiplication
    in three seconds. It had 64 words of 22 bits each and worked with
    the equivalent of modern floating-point numbers.

    As mentioned above, it can be said to use RPN, though there are only
    two registers and nine instructions. Opcodes were encoded in eight
    bits. The computer is programmable via punched paper tape (you can see
    the tape device to the right of the control console, though it's a bit
    of a scribble in my drawing).

    It is also a stack machine. Again, this is with a mere
    two registers, which get juggled in a particular sequence as you
    load and store values.

    Fun fact: The control unit used special control
    wheels to encode microsequences. If the microsequence wasn't
    programmed correctly, it could short-circuit the machine and destroy
    the hardware!

    I got most of this information from this excellent paper by
    Raul Rojas:
    Konrad Zuse's Legacy: The Architecture of the Z1 and Z3 (PDF) (ed-thelen.org).

    Anyway, so the simple mechanics of RPN and stack-based
    operation are very natural for digital computing machines
    and their use goes back to the very beginning.

    But Joy and the term "concatenative programming" come from the
    1980s.

    Uh oh.

    While the ideas of combinators and other types of
    universal computation were well known in certain mathematical
    and computational circles, I would argue they were not very amenable
    to existing computer hardware until much later when computers became
    fast enough to support "functional programming" styles and
    abstractions.

    Until then, programming was "close to the metal."
    Even the idea of "structured programming" with programming language
    concepts like if/else or while/for loops was
    once considered novel! Until then, everything was done with address
    jumps or GOTO.

    It's important to remember that "coding", the
    actual act of turning an abstract program into machine code,
    was long ago considered to be a mere secretarial skill, not far
    removed from typing and other forms of data entry.
    This is why some people (including myself) refer themselves as
    "programmers" rather than "coders".

    Concatenative programming, with its emphasis on combinators
    (and immutable data structures, which we haven't talked about),
    doesn't have the same historic grounding for Forth the way that RPN
    syntax and stack-based programming do.

    So I must conclude that understanding concatenative programming
    is super cool, but it doesn't actually help us understand the
    true nature of Forth because it doesn't describe how Forth came to be.
    It is not part of Forth's "origin story."

    As we'll soon see, Forth really is about the "nuts and
    bolts". You bring your own theories with you.

    So while all these descriptions of the Forth language are true
    (RPN, stack-based, concatenative), they all describe
    the language Forth from the vantage of hindsight.

    There's nothing wrong with thinking about Forth in these terms,
    but it doesn't answer the "why" questions:

    "Why does Forth have this syntax?"

    "Why does Forth work this way?"

    I think the answers to the "why" questions are best answered by
    looking at when.

    What is Forth's history, anyway?

    We need to go back to the 1950s.

    If this image doesn't make any sense to you, citizen of
    the future, it's from the iconic movie poster by Drew Struzan for
    Back to the Future (1985) (wikipedia.org).

    Smithsonian Astrophysical Observatory and MIT 1958

    Chuck Moore is programming an IBM 704 with Fortran on punchards.

    "Compiling took 30 minutes...you got one shot per day"

    -- Chuck Moore, Forth, the Early years

    In Forth - The Early Years (PDF) (worrydream.com), Chuck
    Moore recites a fairly terse history of Forth, from the earliest
    pre-Forths to the creation of the language standard.

    (Note: Chuck mentions the Smithsonian Astrophysical Observatory
    (SAO) and the Massachusetts Institute of Technology (MIT) in
    roughly the same time period, and it's a bit difficult to be
    entirely sure which part is talking about which organization. But
    if you look at a map, SAO is at Harvard University. Harvard and MIT
    are about a mile apart in Cambridge, Massachusetts. It's basically a
    singular point if you zoom out a bit. So that helps explain the
    overlap.)

    The computer in question is the
    IBM 704
    (wikipedia.org)
    It was one of those room-filling vacuum-tube computers with
    tape drives the size of refrigerators.

    The 704 was a fully programmable "modern" computer with
    magnetic-core memory, multiple registers, a 36-bit instruction set, and
    36-bit words ("word" as in native memory size for the processor, not
    "word" as in Forth functions).

    There were switches for each register on the control console, but
    programs could be written to and read from paper punch cards.

    It was very modern for the time, but...

    "In its day, the 704 was an exceptionally reliable machine.
    Being a vacuum-tube machine, however, the IBM 704 had very poor
    reliability by today's standards. On average, the machine failed around
    every 8 hours, which limited the program size that the first Fortran
    compilers could successfully translate because the machine would fail
    before a successful compilation of a large program."

    It's difficult to imagine now, but changing parameters for a program,
    re-compiling it, and running it again could take a day (assuming you
    didn't make any mistakes).

    So Chuck solved that irritation with an extremely clever solution:

    Moore made an interactive interpreter
    on a computer with nothing we would recognize today as an interactive
    terminal.

    He accomplished this by making his program programmable.

    Here's a quote from The Evolution of Forth (forth.com):

    "Moore's programming career began in the late 1950s at the
    Smithsonian Astrophysical Observatory with programs to compute
    ephemerides, orbital elements, satellite station positions, etc.
    His source code filled two card trays. To minimize recompiling this
    large program, he developed a simple interpreter to read cards
    controlling the program. This enabled him to compose different
    equations for several satellites without recompiling..."

    His free-form input format turned out, ironically, to be more
    reliable for human use than Fortran, which required formatted
    columns. (At the time, any mis-aligned columns in Fortran punchcard
    input would require a re-run of the program!)

    It was also faster and more compact.

    These "programming the program" statements in Moore's simple
    interpreter did not use keywords.
    They were statement numbers encoded on a punchcard.

    This is the origin of the system that would eventually be named
    Forth.

    According to Moore, the interpreter's statement numbers would have been
    roughly equivalent to these Forth words:

    WORD NUMBER INTERPRET ABORT
        

    Free-form input was unusual at the time. It's obviously a super nice
    alternative to recompiling your calculation program every time you want
    to change some numbers!

    So, at last, we have discovered the true origin of the Forth
    language
    : Moore wrote a simple interpreter to reduce waste
    and tedium.

    Already, Moore has exhibited the defining combination of traits
    shared by great programmers around the world: Inventive and allergic to
    tedium.

    If it had stopped there, it would have been a clever trick and
    perhaps worthy of a footnote in history.

    But Chuck Moore did not stop there.

    Stanford 1961

    Now we head from Massachusetts to California where Moore found
    himself at Stanford University where he received his BA in Physics
    and started graduate school. He worked with Stanford's
    Burroughs B5500.

    Let's talk about the computer first:

    The B5500 (or "B 5500" - the official manual puts a space between
    the B and the number) was a solid-state computer. It was part of the
    "second-generation" of computers
    (wikipedia.org).
    These computers had discrete transistors on circuit boards. By
    contrast, the first generation before them used vacuum tubes
    (like the aforementioned IBM 704) and the third generation
    after them used integrated circuits.

    In fact, the
    Burroughs Large Systems
    engineers were transistor computer pioneers.
    And the B5000 series was a pioneering system.

    Here's some more resources:

    • Burroughs B5000 / B5500 / B5700 gallery
      (retrocomputingtasmania.com)
      - an awesome illustrated guide including a picture of the
      actual Stanford B5500.
    • Burroughs B5500 Reference Manual (PDF)
      (bitsavers.org)
      - The entire 224 page manual that came with the computer.
    • Early Computers at Stanford
      (stanford.edu)
      - a description of the computer itself and a brief summary
      of its use at Stanford.

    And what exactly did Chuck Moore do with that B5500 machine?

    Moore's CURVE was another mathematical application, written in
    Stanford's own Algol implementation.

    It contained a much more sophisticated interpreter this time
    with a data stack and control flow operators.

    Equivalent Forth words:

    IF ELSE DUP DROP SWAP + - * 
        

    (As we'll see, symbols like "+" and "-" are words in Forth.)

    Moore worked on the Stanford Linear Accelerator
    as a programmer. His focus was on steering the beam of
    the electron accelerator.

    The CURVE program was even more "programmable" than
    his Fortran program at SAO. He took those ideas and
    expanded them to include the idea of a parameter stack
    and the ability to define new procedures.

    This made the interpreter much more flexible and capable.

    Aside: At this point, I also think it's interesting to
    compare Moore's budding interpreter language with another interpreter
    created specifically to be embedded in larger programs for controlling
    them:
    The Tcl programming language
    (wikipedia.org).
    27 years after Moore started his work, John Ousterhout created Tcl out
    of frustration with ad-hoc, half-baked solutions in 1988 at Berkeley. The
    name comes from "Tool Command Language". But the comparison
    goes deeper than just the shared motivation.
    Tcl and Forth
    have similar levels of syntactical purity and flexibility. Everything
    in Tcl is a string! Both languages give the user the power to define
    fundamental parts of the system, such as new control structures, in the
    language itself. If this sounds interesting, you owe it to yourself to
    play with Tcl for a while. It is extremely clever and extremely
    capable. The main implementation has been well cared-for and can be
    found on most Unix-like systems, often installed by default.

    As Moore demonstrated with CURVE, a powerful, extensible interpreter
    is a huge time-saver (certainly when compared to re-compiling the
    program!) and allows the user of the program to add to the program's
    functionality on the fly. It's difficult to overstate how powerful this
    can be.

    Truly, now we have the beginnings of a fully-fledged
    programming language. It's not named Forth yet, but
    we're getting closer.

    Freelancing 1965

    "With the TTY came paper-tape and some of the
    most un-friendly software imaginable - hours of editing and punching
    and loading and assembling and printing and loading and testing
    and repeating."

    -- Chuck Moore, Forth, the Early years

    First, let's talk about what "TTY" means in 1965.
    Teleprinters
    (wikipedia.org) or "teletypewriters" or just "teletype"
    were all printer devices. They printed to continuous sheets of paper
    fan-folded to fit into boxes.

    The Latin "tele-" prefix means "far" or "at a distance". These
    machines trace a direct lineage from telegraphs and Morse code.

    In the late 1800s, the concept of a typewriter which operated over
    telegraph lines had been explored and existed in a variety of forms.
    But the transmission code, paper tape, and typewriter system devised by
    Donald Murray (oztypewriter.blogspot.com)
    is the one that won out. And it was arguably Murray's
    choice of QWERTY keyboard that cemented it as the standard around
    the world.

    The existing Baudot code (from which we also get the term "baud")
    was modified by Murray into something that very much resembles what we
    still use today. Murray also introduced the concept of control
    characters, which still clearly retain their typewriter origins in the
    names:
    CR (carriage return) and LF (line feed).

    Teletype machines started as point-to-point text communication
    tools (like the telegraph), but they were later used over switched
    networks like the world-wide Telex system which used pulse dialing
    to automatically route a connection through the network.

    The Teletype Model 33
    (wikipedia.org)
    I drew above was one of the most popular teletypes used with computers.
    It was created by The Teletype Corporation in 1963, which means it
    shares a birth year with the ASCII standard! It remained popular until
    the mid-1970s when video terminals finally came down in price enough to
    push printer teletypes aside. In fact, Teletype Co. made the Model 33
    until 1981, which is much later than I would have guessed!

    As for
    paper-tape
    (wikipedia.org), I'll just quote Wikipedia directly:

    "Punched tape was used as a way of storing messages for
    teletypewriters. Operators typed in the message to the paper tape,
    and then sent the message at the maximum line speed from the tape.
    This permitted the operator to prepare the message "off-line" at
    the operator's best typing speed, and permitted the operator to
    correct any error prior to transmission. An experienced operator
    could prepare a message at 135 words per minute (WPM) or more for
    short periods."

    Donald Murray didn't invent the concept of perforated paper
    tape for data storage, but his system used it for the encoding of
    transmitted messages from the keyboard. It doesn't seem like a stretch
    to trace the origins of this storage method to Murray's system.

    The computers of this era and earlier were paper manipulators.
    They were kind of like really complicated typewriters. They displayed
    their output on paper, they were programmed with paper, and they kept
    long-term storage on paper!

    But as time went on, computer interactivity increased. They became
    less like typewriters and more like the machines we use today.

    As each new ability emerged, Forth became increasingly interactive.

    Forth gains direct terminal input and output!

    KEY EMIT CR SPACE DIGIT
        

    These new words turned Moore's system into a program editor.

    Now you can edit the program within the program.

    Moore's complete system is now kind of like an integrated development
    environment and kind of like an operating system.

    In the mid-1960s, "mini-computers" came out. They were
    still huge by today's standards, but no longer required a
    large room of their own.

    In addition to the reduction in size, the other emerging change was
    direct interactive use of a computer via teletype.

    Specifically, the invention of
    timesharing (stanford.edu)
    was a huge shift away from the "batch processing" style of
    computing that had come before (like with input via punchcard).

    (Fun fact: A "second generation" time-sharing operating system
    called Multics
    (multicians.org)
    was the spiritual ancestor of and
    name from which Brian Kernighan made the joke name
    Unix: "One of whatever Multics was many of".)

    Moore's evolving pre-Forth language also gained
    completely interactive editing and executing of programs.

    This would have been right around the time
    that the original
    LISP REPL (Read-eval-print loop)
    (wikipedia.org)
    was created in 1964 on a PDP-1.

    If not pre-saging, Moore was certainly on the bleeding edge
    of interactive computer usage!

    Aside: If you want to see an awesome demonstration of
    interactive computer usage on paper, check out this demonstration
    by Bob Spence:
    APL demonstration 1975
    (youtube.com).
    Bob Spence
    (wikipedia.org)
    is best known for his own contributions, including a number of early
    clever computer interaction ideas that are worth re-examining today.
    Bob's demo is extremely pleasant to watch and brilliantly presented
    in split screen. Notice how paper output lets you mark up stuff with
    a pen - pretty nice feature!
    And
    APL
    (wikipedia.org)
    is a whole other rabbit hole which has interesting intersections with
    the point-free and higher-order function programming we've encountered
    earlier.

    Then this happens...

    1968

    IBM 1130 minicomputer at Mohasco, a textiles manufacturer in New York.

    16 bit, 8 KB RAM.

    Backup was via punch/reader.

    With disks, now we can have file names!

    File names limited to 5 characters...

    Moore names his "fourth generation" system "FORTH".

    Yup, this really is the origin of the name, "Forth". Funny how
    temporary things tend to stick and last forever, isn't it?

    The
    IBM 1130
    (wikipedia.org)
    is one of those new-fangled "minicomputers" we've talked about.
    Gosh, it was so small, the CPU weighed less than a car!

    And it was affordable! The base model was as low as $32,000.
    Compare that to $20,000, the median price for a house in the U.S.
    in 1965.
    Just think of that: If you could afford a house, you were well
    on your way to being able to afford a computer!

    As noted, the unit Chuck Moore worked on had a disk drive,
    which would have bumped up the price an additional $9,000.
    That would be the equivalent of buying an above-average house
    and adding a couple brand-new 1965 cars in the driveway.

    But, wow, imagine having disk drive cartridges with 512 KB of
    storage at your disposal. What would you do with all that space?

    As mentioned, at this time, we're still interacting with the
    computer (mostly) via paper, but these minis brought the idea of
    interactive computing to "the masses" because they were so much
    smaller, cheaper, and more reliable than the sorts of computers that
    had come before.

    Quoting
    The Evolution of Forth (forth.com):

    "Newly married and seeking a small town environment, Moore joined
    Mohasco Industries in Amsterdam, NY, in 1968. Here he developed
    computer graphics programs for an IBM 1130 minicomputer with a 2250
    graphic display. This computer had a 16-bit CPU, 8k RAM, his first
    disk, keyboard, printer, card reader/punch (used as disk backup!),
    and Fortran compiler. He added a cross-assembler to his program to
    generate code for the 2250, as well as a primitive editor and
    source-management tools. This system could draw animated 3-D
    images, at a time when IBM's software for that configuration
    drew only static 2-D images. For fun, he also wrote a version of
    Spacewar, an early video game, and converted his Algol Chess
    program into the new language, now (for the first time) called
    FORTH. He was impressed by how much simpler it became."

    As you may have gathered by now, Chuck Moore is a pretty
    extraordinary computer programmer.

    It turns out the IBM 1130 was hugely influential to a bunch of early
    big-name programmers in addition to Moore. Something was in
    the air.

    In addition to its funny new name, Forth had also gained new
    abilities:

    Moore adds return call stack, allowing nested word definitions:

    : DOUBLE DUP + ;
    : QUAD DOUBLE DOUBLE ;
        

    And a dictionary of words.

    It's not just the name that makes this the first real Forth:
    A dictionary of named words which can be called interactively or
    recursively in the definitions of other words is one of the
    defining features of Forth. The ability to use words as building
    blocks is the Forth language's primary abstraction.

    In the example above, we've defined a word called DOUBLE
    which duplicates the number on the top of the stack and adds the
    two numbers together.

    A second word called QUAD uses the previous definition
    by calling DOUBLE twice, quadrupling the number in a
    rather amusing way.

    A return stack makes this possible. Without a return stack, we have
    no way of telling the computer how to "get back" to the place in
    QUAD where we left off after DOUBLE is done.

    (We'll get to the specifics of the syntax soon. That's another
    vital part of understanding Forth.)

    1970

    Still at Mohasco. Programming a Univac 1108.

    A new port of Forth written in assembler and could call COBOL modules
    because that's what the corporate suits wanted in 1970.

    Moore hates complexity.

    First of all, the UNIVAC 1108
    (wikipedia.org)
    is a great example of the awesome "retro-futuristic" design of
    these old machines. Just look at the sweeping angles in my drawing
    of the console. That's a cool computer console!

    When these computers cost more than a house, it makes perfect
    sense that they were constructed into beautiful custom furniture
    that made them look like space ships.

    You have to wonder: Did the sci-fi art of the time drive
    the design of these computers or did the computers and industrial
    design of the time inform the art? Or, more likely, did they both
    feed off of each other in the classic cycle of, "life imitates art
    imitates life?"

    That's a teletypewriter built into the desk of the console.
    I presume the tractor-feed paper would have spooled to and from
    containers behind the sleek facade.

    Anyway, the UNIVAC 1108 is an even more modern computer than the IBM
    1130. Now we're moving into using integrated circuits for everything,
    including the register storage. (Speaking of registers, the 1108 had
    128 of them and must have been interesting to program!)

    As was also the trend at the time, the CPU
    was constructed of discrete cards connected together by a wire-wrapped
    backplane.

    If you're not familiar with the technique, you should know that
    wire-wrapped
    (wikipedia.org)
    connections are extremely high quality. Wire is wrapped with
    great force around a post, making a gas-tight connection that will not
    corrode (corrosion can occur outside the connection, of course). A
    little bit of the insulation gets wrapped in the last turns, which
    provides flexibility and strain relief. There are NASA guidelines for
    making a perfect wire-wrap connection.

    Anyway, the Univac was even more powerful and modern
    than Moore's previous computer and he took advantage of it.

    You don't have to read between the lines to see Moore's obvious
    distaste of
    COBOL
    (wikipedia.org),
    the COmmon Business-Oriented Language.
    What's impressive is that he managed to still use Forth while
    also using the required COBOL modules.

    When this project was abandoned by the employer, Moore was
    upset by the whole situation, particularly the way business software
    was increasing in complexity. This won't be the last time we
    see this theme crop up.

    He also wrote a book (unpublished) at this time called
    Programming a Problem-Oriented Language.
    It's written in typical Moore fashion, without superfluous words or
    exposition. Feel free to contrast this with the article you're reading
    now.

    (This book will be mentioned again later.)

    NRAO - Early 1970s

    National Radio Astronomy Observatory
    - Computer control software for radio telescopes.

    Radio telescopes are like visual telescopes, but they collect lower
    frequency waves. Thanks to the magic of computers, we can process these
    signals to see what the radio telescopes see.

    Radio telescopes can work with everything from 1 kHz, which is just
    below the uses of "radio" as we think of it for navigation,
    communication, and entertainment, to 30 GHz, which is still well under
    the visible portion of the electromagnetic spectrum. Consumer microwave
    ovens operate at about 2.45 GHz.

    (Speaking of Gigahertz, apparently Intel Core i9 processors can run
    at clock speeds up to 6 Ghz, but most CPU designs top out at around 4
    Ghz. This may be important for Forth for reasons I explain later.)

    The visible part of the spectrum is very small by comparison. It
    starts at 420 THz (terahertz) and ends at 720 THz. The familiar
    rainbow of colors captured in the mnemonics "Roy G. Biv" or "Richard of
    York Gave Battle in Vain" (ROYGBIV) lists colors in order of lowest
    frequency (Red) to highest (Violet).

    Here is the official website of the
    National Radio Astronomy Observatory
    (nrao.edu).
    But for a better summary,
    the Wikipedia entry (wikipedia.org)
    is the way to go. Be sure to scroll down to the incredible image and
    description from 1988 of the collapsed 300ft radio telescope:

    "The telescope stood at 240ft in height, wieghed 600-tons, had a
    2-min arc accuracy, and had a surface accuracy of ~1 inch. The
    collapse in 1988 was found to be due to unanticipated stresses
    which cracked a hidden, yet weight and stress-supporting steel
    connector plate, in the support structure of the massive telescope.
    A cascade failure of the structure occurred at 9:43pm causing the
    entire telescope to implode."

    The 300ft dish had been the world's largest radio telescope when it
    went active in 1962 at the NRAO site in West Virginia.

    My drawing above is of the
    Very Large Array
    (wikipedia.org)
    in New Mexico.
    NRAO is also a partner in a huge international array in Chile.

    By using radio interferometry, arrays of telescopes can be treated
    as essentially one huge telescope with the diameter of the array
    (missing the sensitivity a dish of that size would have).

    But the scope for which Moore wrote software was a single 36ft (11
    meter) dish at Kitt Peak in Arizona called The 36-Foot Telescope.
    It was constructed in 1967 and continued
    working until it was replaced with a slightly larger and more
    accurate dish in 2013.

    The 36ft scope was used for millimeter-wavelength molecular astronomy.
    This is the range above "microwaves" and these telescopes pretty
    much have to be constructed at dry, high altitude sites because
    water vapor in the air can interfere with the radio waves.

    (Note that Moore stayed at the NRAO headquarters in Virginia and
    was not on-site at Kitt Peak.)

    NRAO had a policy of using Fortran on its minicomputers, but based
    on the success of his previous work, Moore was begrudgingly given
    permission to use Forth instead.
    I couldn't possibly do justice to summarizing it, so here's Chuck's
    own words describing the software he wrote for the NRAO (also from
    Forth - The Early Years):

    "There were two modes of observing, continuum and spectral-line.
    Spectral-line was the most fun, for I could display spectra as they
    were collected and fit line-shapes with least-squares."

    It did advance the state-of-the-art in on-line data reduction.
    Astronomers used it to discover and map inter-stellar molecules
    just as that became hot research."

    Here is a photo (nrao.edu) of the 36-foot telescope.
    And
    here is a photo of the control room in 1974
    (nrao.edu)
    with what appears to be a PDP-11 in the background.

    As you can see, the work itself was extremely interesting and
    cutting-edge. But how Moore went about it was also very interesting,
    which a series of computer drawings will demonstrate in a moment.

    But on the Forth language front, there was another development...

    At this time, there are talks of patenting Forth.

    Moore believes ideas shouldn't be patented.

    We take it for granted now that "free" or "open" software
    unencumbered by patents and restrictive corporate licenses is a good
    thing. But this was absolutely not a mainstream position in
    the early 1970s.

    To put things in context, in the summer of 1970,
    Richard Stallman
    (wikipedia.org) was just out of high school and was writing
    his first programs in Fortran (which he hated) and then APL.

    It wasn't until 1980 that Stallman finally got fed up enough with
    the state of proprietary and legally encumbered software to start the
    "free-as-in-freedom" software revolution. Companies were increasingly
    using copyright to prevent modification, improvement, or duplication by
    the end user. Stallman, being a pretty incredible programmer, wrote free
    clones of such programs. He announced the
    GNU project
    (wikipedia.org)
    in 1983.

    Aside: I believe Stallman was right. There's absolutely
    nothing wrong with writing programs for money or selling software. But
    using the law to prevent people from truly owning that software
    by limiting how or where to run it, or even preventing people from
    writing their own similar software, if they are capable
    , is an
    abominable practice and should be countered at every step.

    Moore also rejects the standardization of Forth.

    "All of my fears of the standard and none of the advantages of the standard have come to pass. Any spirit of innovation has been thoroughly quelched.

    Underground Forths are still needed.

    I said I thought the standard should be a publication standard but they wanted an execution standard."

    -- Chuck Moore, 1997

    Quote from the ANSI Forth section in
    this cool collection of Forth quotes
    (ultratechnology.com) by Jeff Fox.

    I think that when you get to the heart of what Forth is all
    about, Moore's displeasure with the ANSI standardization suddenly makes
    tons of sense. In short, the whole point of Forth is to create
    your own toolkit. Having an all-inclusive language standard is great
    for making sure Forths are interchangeable. Unfortunately, it's
    also antithetical to adapting the language to your specific hardware
    and software needs.

    Alright, enough philosophizing. Let's get back to the computer
    stuff!

    While Moore was at NRAO, he also wrote software to point the telescope.
    Elizabeth Rather (Moore credits her as Bess Rather in his paper) was
    hired for support and they worked together on at least one port.
    The Forth system migrated across multiple machines at NRAO which,
    as we'll see, highlights one of the technological strengths of the
    standard Forth implementation.

    By the way, after her initial reaction of shock and horror,
    Elizabeth Rather embraced Forth. From
    The Evolution of Forth
    (forth.com):

    "After about two months, Rather began to realize that something
    extraordinary was happening: despite the incredibly primitive
    nature of the on-line computers, despite the weirdness of the
    language, despite the lack of any local experts or resources, she
    could accomplish more in the few hours she spent on the Forth
    computers once a week than the entire rest of the week when she had
    virtually unlimited access to several large mainframes."

    Rather went on to write the first Forth manual in 1972 and
    write papers about it for the NRAO and other astronomical organizations.

    Later, Elizabeth "Bess" Rather
    (wikipedia.org)
    became the co-founder of FORTH, Inc with Chuck and
    remained one of the leading experts and promoters of the Forth language
    until her retirement in 2006.

    There's a great overview paper of the whole NRAO system by
    Moore and Rather in a 1973 Proceedings of the IEEE:
    The FORTH Program for Spectral Line Observing (PDF)
    (iae.nl).

    It includes a high-level description of the system with examples of
    interactive Forth usage and a neat diagram on the first page, which you
    can see in the screenshot.

    As mentioned, Forth was ported to a bunch of different computers
    at NRAO.

    Let's take a look:

    Forth on the IBM 360/50

    Moore mentions first having ported his Forth system to the
    IBM 360/50
    (wikipedia.org).

    The System/360 (or S/360) computers were extremely successful,
    largely because of availability, longevity, and compatibility.
    IBM claims to be the first company to use
    microcode
    (wikipedia.org)
    to provide a compatible instruction set across all S/360 computers
    despite the hardware differences between models.

    The cheaper 360 computers used microcode while the more expensive
    and powerful machines had hard-wired logic. NASA even had some one-off
    models of IBM 360 made just for them.

    Until microcode came along, if you bought a "cheap" computer to get
    started and then upgraded to a more powerful computer, you would have
    to re-write your programs in a new instruction set. (If you happen to
    have written your programs in a high-level language like Fortran, you
    would still have to re-compile your programs from punchcards, and you
    would need the Fortran compilers on both computers to be perfectly
    compatible!) It's easy to see why being able to upgrade without
    changing your software would have been appealing.

    System/360 computers were
    a "big bet" (5 billion dollars according to IBM themselves:
    System 360: From Computers to Computer Systems
    (ibm.com)) that nearly destroyed the company.
    The bet clearly paid off because they made these machines
    from 1964 to 1978.

    Oh, and it wasn't just the instruction set that was compatible. The
    360 computers also had standardized peripheral interfaces, which were
    compatible between machines.
    There was a huge market for peripheral devices. IBM
    themselves made 54 different devices such as memory, printers, card
    readers, etc. The 360 also spawned a whole third-party peripheral
    industry, much like the IBM PC-compatible era that started in 1981 and
    continues to the desktop computer I'm typing on right now in 2023.

    Moore wrote Forth from scratch in S/360 assembly.

    Then...

    Forth ported to the Honeywell 316

    I drew Chuck behind the system in this one because I couldn't
    bring myself to obscure an inch of that glorious pedestal console.

    You can see the
    Honeywell 316
    (wikipedia.org)
    and the brochure
    (wikimedia.org)
    image from which I made my drawing.

    Just look at the space-age lines on that thing! It looks straight
    out of a Star Trek set. Sadly, there's basically no chance the one
    Moore actually worked on had this console. Less than 20 of them were
    sold. But thanks to my drawing, we can pretend.

    Beyond just its appearance, this particular console has a really
    wild history. The extravagant gift company, Neiman Marcus, actually
    offered the Honeywell H316 with this pedestal as a "kitchen computer".
    It cost $10,000 and would have come with a two-week course to learn
    how to input recipes and balance a checkbook using toggle switches and
    lights to indicate binary data! (As far as anyone knows, none of these
    were actually sold.)

    The ad for the Honeywell Kitchen Computer was in full "Mad Men"
    mode and was extremely patronizing, as was unfortunately typical for
    the time. But if you can look past that, the whole thing is quite
    funny:

    "Her souffles are supreme, her meal planning a challenge? She's
    what the Honeywell people had in mind when they devised our Kitchen
    Computer. She'll learn to program it with a cross-reference to her
    favorite recipes by N-M's own Helen Corbitt. Then by simply pushing
    a few buttons obtain a complete menu organized around the entree.
    And if she pales at reckoning her lunch tabs, she can program it to
    balance the family checkbook..."

    You can see a tiny scan of the original ad with a woman admiring
    her new Honeywell Kitchen Computer that barely fits in her kitchen
    here
    (wikipedia.org).

    But moving on from the pedestal...

    The implementation of Forth on the H316 is considered to be the
    first complete, stand-alone implementation because it was actually
    programmed on the computer itself and was used to create other
    Forths. It is at this point that Moore has achieved a fully
    ascendant system.

    But wait, there's moore...er,
    sorry, more!

    As is typical for a Chuck Moore endeavor, this
    telescope application pushed other new boundaries:
    The system actually ran across two computers (we're about to see
    the second one) and gave real-time access to multiple astronomers.
    Because it spread the load the way it did, there were no issues with
    concurrency, which is something we programmers struggle with to this day.

    This real-time control and analysis was basically a
    luxury available on no other system at the time.
    Even Honeywell, the creator of these computers, had only been able to
    achieve the most primitive concurrency for them and it was
    nothing like this.

    As usual, Moore was right on the very crest of
    computing with his ultra-flexible Forth system.

    ...And ported to the Honeywell DDP-116

    As mentioned above, the Forth system was also ported to the
    DDP-116
    (t-larchive.org).
    and used with its "parent" system on the H316 featured above.

    (The DDP-116 was originally manufactured by
    Computer Control Company in 1965, but CCC was sold to Honeywell in 1966 and
    became its Computer Controls division.)

    The DDP-116 was a 16-bit computer (the first available for
    purchase), but still part of that "second generation" of computers
    we've mentioned before, with individual
    transistors and components wire-wrapped together on huge circuit
    boards. (Check out the pictures on the DDP-116 link above for all
    sorts of excellent views of the insides and outsides of an example
    machine and its peripheral devices!)
    It happens to have also been a pretty rare computer. It didn't sell
    in vast quantities like the IBM systems.

    As you can see in the drawing, Chuck Moore began to grow in power as
    his system evolved and this manifested in additional
    arms
    ! Or maybe I started to get a little loopy while
    drawing old computers for these slides in the final evenings before I
    was due to give my talk? I'll let you decide what is real.

    But wait, there's one more!

    Forth on the DEC PDP-11

    (Yes, that PDP-11.)

    The
    PDP-11
    (wikipedia.org) was by some measures the most popular minicomputer ever.

    It was a 16-bit machine and had an orthogonal instruction set
    (meaning the same instruction could be used in multiple ways
    depending on the operand. This makes the mnemonics of the instruction
    set smaller and more logical and much easier to memorize).
    This was even more powerful because I/O was memory-mapped, so the
    same instructions used to move values around in memory and
    registers could also be used to transfer data to
    and from devices.

    All told, these conveniences made the PDP-11 fun to program!
    Assembly language programmers rejoiced. The ideas in the PDP-11 spread
    rapidly and are to be found in the most popular architectures in use
    today. Compared to what came before it, PDP-11 assembly language will
    look surprisingly familiar to modern assembly programmers.

    The original machines were made starting in 1970 with
    wire-wrapped backplanes and discrete logic gates.
    Later models introduced "large-scale integration," which is a term
    we'll see later, so hold that question!
    These later versions of the PDP-11 were still being
    made twenty years later in 1990! There are apparently still PDP-11s
    performing crucial tasks today, with nuclear power plants being one of
    the most prominent examples.

    It's hard to see in my drawing, but the PDP-11 front panel is one
    of the most iconic computer interfaces ever made. Hobbyists make
    working models, including ridiculously cute and awesome miniature
    versions. Here are two model versions - click on them to go to the
    original wikipedia.org files, where you can admire their full beauty:



    It would be difficult to overstate the impact of this machine.
    Probably the most famous piece of software released on the PDP-11
    was the first version of
    Unix
    (wikipedia.org)
    that actually bore the name "Unix".

    It was also the birthplace of the
    C
    (wikipedia.org)
    programming language.
    Dennis Ritchie ported Ken Thompson's B language to the PDP-11 to
    take advantage of its abilities. Unix was then re-written in C
    starting with Version 4.
    So the Unix we know today and a large portion of the command line
    utilities that are standard with a Unix-like system were programmed
    on the PDP-11. (And you can thank Richard Stallman's GNU project for
    freeing those for the masses. GNU stands for "GNU's Not Unix!")

    You'll also note that Chuck Moore has gained his
    fourth and final arm in my drawing above
    ("fourth," ha ha).
    This may or may not reflect actual events.
    Also, I'm not sure if Moore would have been using a video terminal at
    that time. It's possible. DEC's first video terminal was the
    VT05
    (columbia.edu),
    which came out in 1970.

    So much porting!

    All of this porting of Forth to new machines is possible because of
    indirect threaded code.

    "Threaded code" in this usage is not
    related to concurrency, i.e. "multi-threaded programming".

    It's code that is composed of subroutines addresses.

    Threaded code can be machine code or interpreted.

    Wait, aren't most programs composed of calls to subroutines?

    That's true. The big difference is that
    threaded code
    (wikipedia.org) in this sense
    doesn't actually contain the instructions to call the
    subroutines. It stores just the addresses.
    Therefore another routine is responsible for advancing
    a pointer over the address
    list and executing the subroutines.

    Huh?

    Yeah, there's no way around it, threaded code is complicated.

    And indirect threaded code is even more complicated (and
    harder to explain).

    "Hey, wait!" I hear you saying. "If Chuck hates complexity so
    much, why did he use such a complex method for Forth?"

    That's completely fair.

    But before we address that, I'll try to briefly explain how
    threaded code is stored and executed.

    First, here's how normal machine code might be written:

    Direct calls (not threaded):

    jmp 0x0804000
    jmp eax
        

    This is the simplest type of "call" to store in a program.
    We simply have the jmp (jump) instruction followed
    by the address to jump to.
    Here I show both a hard-coded address
    (0x0804000) and a register
    (eax).
    Both of these are "direct" for our purposes.

    Alternatively, many processors have a more advanced call
    instruction. A call is more complicated because it has to do additional
    work behind the scenes. It must store a return address on "the stack"
    before jumping to the specified address. Then a ret
    (return) instruction at the end of the called routine can use the
    stored address to resume the execution just after the "call site" where
    the call was first made. Why are return addresses stored on a stack?
    That's because you can nest calls. Pushing addresses as you jump and
    popping them in reverse order as you return keeps things nice and neat.
    This "the stack" is not what Forth refers to as "the stack". Forth's
    main stack is better known as "the parameter stack". Many Forth
    implementations also have a return stack!

    Anyway, this is direct and it's not threaded. Just jump to an address.

    The first step of complication is adding indirection.

    Indirect calls (not threaded):

    jmp [eax]
        

    For this example to make sense, you need to know that the
    square brackets around the register ([eax])
    is a common assembly language convention that means
    "the value at the memory address that is stored in register eax".

    So jmp [eax] means "jump to the address
    stored at the address stored in register eax."

    That's indirect.

    So now we have the "indirect" part of "indirect threaded
    code." But what's the "threaded" part?

    Storing threaded code:

    
    
    
    
        

    Instead of containing the actual instructions to jump or
    call subroutines:

    jmp 0x0804000
    jmp 0x080A816
    jmp 0x08C8800
    jmp 0x08C8DD0
            

    Threaded code stores just the list of
    addresses:

    0x0804000
    0x080A816
    0x08C8800
    0x08C8DD0
            

    There are two consequences of storing code like this:

    • The address list takes up less memory than the full code to
      make the jump. (In fact, it takes a lot less on some
      historic machines.) This is good.
    • Some sort of "code interpreter" will need to be written to
      execute this list. You can't just send a list of addresses
      to a processor and expect it to work. This could be good or bad.

    Another way to look at the list of addresses above is that,
    conceptually, threaded code is basically a list of subroutines.

    To complete our definition of "indirect threaded" code, we just
    need to put both concepts together:

    Storing indirect threaded code:

    
    
    
    
        

    This is where it gets pretty crazy. So now we've got a second
    level of indirection. Why on Earth would we do this?

    Well, this allows us to store a separate "code interpreter"
    (or "inner interpreter") for different kinds of subroutines!

    Instead of pointing directly at subroutines, these addresses point
    at interpreters.
    Talk about ultimate flexibility - every subroutine in an indirect
    threaded program can have its own custom interpreter for the rest
    of its instructions...each of which can also be threaded...or
    indirectly threaded!

    But what calls all of these inner interpreters?
    An outer interpreter, of course! The outer interpreter is the
    part we actually interact with when we sit down to type
    at a Forth terminal.

    In Forth, indirect threaded code is a list of
    addresses pointing to the "inner interpreter" portions of
    words, which execute the rest of the word.
    What types of inner interpreters could we have, anyway?
    Well, for example, we might have one kind of word that stores a string
    in memory and another that executes machine code. But the only
    limit is your imagination.

    Make sense?

    I personally would not have understood
    that explanation at all until much later in my journey (I know this
    because similar - probably better - explanations flew right over
    my head). No doubt you're faster than me at apprehending this stuff
    and are already halfway through implementing your own Forth based on
    these descriptions.

    None of the rest of the material requires understanding any
    of the above, so please don't feel you need to fully
    grok
    (wikipedia.org)
    it before continuing. Indirect threading is an important part of
    Forth's history, but there are plenty of Forths that do not use it.

    Threaded code was much more common in the days of yore.

    It is very dense, compact on disk and in memory.

    In addition to its compact storage, threaded code
    would have been even more efficient on the contemporary
    machines during Forth's gestation because
    calling subroutines often wasn't as simple as the
    call instruction found on "modern" architectures.

    Subroutine and procedure call support
    (clemson.edu) by Mark Smotherman explains:

    "1963 - Burroughs B5000 - A stack-based computer with support for
    block-structured languages like Algol. Parameters and return address
    are stored on the stack, but subroutine entry is a fairly complex
    operation."

    So the memory and performance improvements of this style of
    subroutine call were potentially
    very great indeed. This is one of the reasons for Forth's
    legendary reputation for high performance
    .

    We'll revisit this topic from another angle soon. But if you're
    interested in these mechanics
    (and want to see the origin of the boxes and arrows
    drawings at the top of this section), check out this multi-part
    article series for The Computer Journal,
    MOVING FORTH Part 1: Design Decisions in the Forth Kernel
    (bradrodriguez.com),
    by Brad Rodriguez.

    The important thing is that we've now fully traced the origins
    of Forth from a simple command interpreter to the full-blown
    interactive language, editor, operating system, and
    method of code storage and execution
    it became.

    That's Forth's origin story.

    • Postfix notation (RPN)
    • Stack-oriented
    • Concatenative programming style
    • Interpreted
    • Highly adaptable to machine architectures
    • Extremely compact

    This gives us the why.

    At last! Now we can put it all together:

    Forth is postfix because that's a natural
    order for the computer and lends itself to an incredibly minimalistic
    interpreter implementation: get the values, operate on them;

    Forth is stack oriented because that's a
    compact and convenient way to store
    values without needing to add variables or name things;

    Forth is concatenative because building a
    language that can operate as a string of words is incredibly
    flexible and can adapt to just about any programming style without
    any help from the language itself. (And it turns out this is
    especially true when you throw in higher-order functions);

    Forth is interpreted because that is
    interactive and allows the programmer to make fast changes on
    the fly or simply "play" with the system. This is part of
    Forth's adaptability and flexibility;

    Forth is self-hosting because you can
    bootstrap a Forth implementation from a handful of words
    implemented in assembly and then write the rest in Forth;

    Forth is extremely compact because machines at
    the time had limited memory and this gave Forth an edge on
    other interpreters (and even compiled languages!) on
    mainframes and mini-computers.

    Now that we have everything in historical context, I think it's
    much clearer why Forth exists and why it takes the peculiar
    form that it does.

    None of this was planned. Chuck didn't sit down at a terminal
    in 1958 and conjure up Forth. Instead, he grew a system to
    serve his needs and to make use of new hardware as it was made
    available.

    Reading about Forth's history is a wonderful way to understand
    what makes Forth special and what it's about.

    But even knowing all of this, I was still a long way off from a true
    understanding of how this all comes together in an
    actual working system. I didn't really understand how it worked.
    And I didn't understand what Forth was actually like to use
    In other words, I still didn't understand Forth as a
    programming language.

    Actually Understanding How Forth Works

    Somewhere along the way, I came across these quotes...

    "To understand Forth, you have to implement a Forth."

    -- Somebody on the Internet

    And

    "Take a look at JonesForth."

    -- Everybody on the Internet

    I've mentioned it before, but I'll point it out again. Notice the
    phrasing "implement a Forth."

    As we've established, Chuck Moore believes a Forth system is best
    when it is custom-tailored to the system and task at hand. So it
    should come as little surprise that writing your own Forth or Forth-like is
    entirely "par for the course" in any would-be-Forther's quest to
    discover the True Meaning of the language and enter the mystical realm
    where All is Revealed.

    Well, what else could I do?

    Having no other clear course of study, I decided to heed the
    wisdom of the crowd.

    Presenting...

    JonesForth and "Assembly Nights"

    To really get to know it, I took Forth to bed with me.

    I wrote
    Assembly Nights
    when I realized how much I was enjoying myself:

    "Over the last three months, I've developed an unusual
    little nighttime routine..."

    I prepared myself for dealing with the JonesForth source
    (i386 assembly language in the GNU GAS assembler)
    by learning some assembly and Linux ABI basics.
    JonesForth is 32-bit only and uses the Linux system call ("syscall")
    ABI directly.

    Then I spent roughly a year porting JonesForth into a complete
    working copy in NASM assembler. (Yes, that's a "port" from one flavor
    of i386 asm to another.)

    I did a tiny bit almost every night. A lot of it was debugging in
    GDB.

    My NASM port of JonesForth: nasmjf

    Opening the third eye by (re)implementing Forth.

    Here's the
    nasmjf web page

    In the process of writing the port, I learned how a traditional
    indirect threaded Forth works.

    And I learned that it takes time to absorb such a
    twisty-turny method of code execution.

    Especially if the x86 assembly language tricks are new to you like
    they were for me.

    JonesForth ascii art:

    One of the first things you encounter when you open up the
    jonesforth.S (a single file which contains the assembly
    language portion of JonesForth) are many ASCII art diagrams.

    Richard W.M. Jones does an excellent job of walking you through
    the workings of the interpreter and explaining the i386 instruction
    set features he uses.

    If the diagram above seems bewildering, I agree.

    So, of course, I thought maybe I could do better...

    Here's my attempt (from the nasmjf source):

    After I was done with my port, I tried to make an ASCII art diagram
    of my own to capture my new understanding.
    In fact, this is one of several.

    With the benefit of the distance of time, it is clear to me that
    these things only make sense once you already understand them to
    some degree. But the act of making them is extremely useful
    for solidifying your understanding.

    But wait, there's more!

    Both ASCII art diagrams above are just part of the complete
    indirect threaded execution system. They're just showing how the "inner
    interpreter" works to execute Forth words.

    Perhaps you recall from the section about indirect threaded code
    above that the second level of indirection allows different
    "interpreter" routines to execute different types of threaded
    subroutines? Well, that's all those two ASCII diagrams are trying
    show.

    But when we say that Forth is an interpreted language,
    this is not what we're talking about. There's also the "outer interpreter"
    that the programmer interacts with.

    The indirect threaded code is just the tip of the iceberg!

    nasmjf inner/outer interpreter diagram:

    In the vector image I made above for nasmjf, I attempted to map out
    the whole thing in my own words.

    If you take anything from this image, it's that
    INTERPRET looks up words (functions) by name and calls
    them by executing the interpreter routine whose address is stored in
    the word (again, this is the indirect threading part). In turn, there
    may be any number of interpreters, but the three main types used in
    JonesForth are:

    • Pure assembly language routines are their own interpreters.
    • "Regular" Forth words use the DOCOL interpreter.
      DOCOL executes the rest of the threaded code in the word,
      most of which is just a list of addresses, but some of
      which will be data. This is the "normal" kind of threaded
      subroutine.
    • Numeric literals have a tiny interpreter routine inline with
      the data that just pushes their value to the stack. Numeric
      literals don't have to be words, though, in JonesForth,
      they're just a bit of inlined machine code.

    But even knowing this only helps to explain how code starts
    executing. How does this type of Forth know what to run after a word is
    complete?

    Ah, for that we have this:

    To get from one code word to another requires a bit of
    assembly pasted at the end of each one. This is
    the NEXT macro. Here it is from nasmjf:

    %macro NEXT 0
        lodsd     ; NEXT: Load from memory into eax, inc esi to point to next word.
        jmp [eax] ; Jump to whatever code we're now pointing at.
    %endmacro
        

    Notice the term "code word". That's the Forth term for words
    written in pure assembly language.

    Every code word has this macro at the end. (Some Forths actually
    call a subroutine for this. JonesForth uses this two-line macro
    because the action is so efficient in i386 machine code.)

    Remember the list of addresses in the explanation of
    "indirect threaded" code? This is how we execute them sequentially.

    This implementation uses the i386 lodsd instruction
    to take care of two operations in one: move a "double word"
    from memory into a register, and then update another register
    so that it points to the next "double" spot in memory.

    (Rant: And a "double" is 32 bits on Intel chips for the really
    annoying reason that they kept the definition of "word" at 16 bits
    even as the platform moved to 32 and then 64-bit architecture. So
    "word" on Intel architectures is a completely meaningless thing
    that you just have to memorize as "16 bits" even though
    "word" is supposed to be the native data size of the architecture.
    And what's worse is that the tools for working with programs on
    Intel chips like GDB then refer to everything with the
    corresponding C names for everything, which naturally assumed that
    the architecture names would be based on reality. But they aren't.
    So terms like "double" and "long" are basically just absolutely
    worthless legacy garbage to memorize and useful only to C and Intel
    architecture veterans.)

    Okay, so now the eax register points to the next
    threaded subroutine address in memory. The jmp starts
    executing whatever that points to, which will be the "inner interpreter"
    for that subroutine.

    Got that?

    A lot of moving parts, right?

    There's more:

    To get from one colon word to another uses a bit of
    assembly pasted at the end of each in a chunk called
    the EXIT macro. Here it is from nasmjf:

    DEFCODE "EXIT",EXIT,0
        POPRSP esi            ; pop return stack into esi
    NEXT
        

    Remember, there's two fundamental types of words in a
    traditional Forth like JonesForth:
    "Code" words and "colon" words.
    Code words are primitives written in machine code. Colon words are
    the "regular" words actually written in the Forth language.

    These "colon" words (so-named because they are assembled
    via the "COLON" compiler, which we'll talk about in a moment),
    all end in the so-called EXIT macro.

    The EXIT macro handles the return stack. Then
    there will be a NEXT after that to conclude whatever code
    word primitive we were in (we're always in at least one because the
    "outer-most" interpreter is a code word primitive!), so the
    process we described above will automatically start where we left off
    at the "call site" of the word we
    just finished executing.

    If you weren't lost before, surely this will do the trick?

    I do have another attempt to explain how this all nests in
    a sort of indented pseudocode:

    My comment in nasmjf attempting to explain the
    execution of indirect threaded
    code as a nested
    sequence of NEXT and EXIT and QUIT:

    ; QUIT (INTERPRET)
    ;     * regular word
    ;         DOCOL
    ;         NEXT
    ;         * regular word
    ;             DOCOL (codeword
    ;             NEXT
    ;             * code word
    ;                 
    ;             NEXT
    ;             * code word
    ;                 
    ;             NEXT
    ;         EXIT
    ;         NEXT
    ;    EXIT
    ;    NEXT
    ; QUIT (BRANCH -8 back to INTERPRET for more)
        

    This nested view of the process is as close as I've ever been to
    explaining (to myself) what the entire execution flow
    looks like at a high level.

    I'm sure every Forth implementer has their own mental model.

    You'll notice we didn't even talk about QUIT.
    Other than the name, that one's not nearly as bad - it's really
    just the end of the outer interpreter loop.

    (So, yeah, we have EXIT and
    QUIT, neither of which leave Forth... Hey, it was the
    1960s. Things were different then.)

    Absolutely nothing else drives the flow of an indirect
    threaded Forth application: It's addresses stored in
    registers, a return stack, and a handful of assembly instructions
    at the end of each machine code word jumping to the next instruction.

    It's like a delicate clockwork machine.

    Don't you see how simple it is?

    Historical note: The above "Crazy Chuck" drawing is a parody of
    a popular meme with actor Charlie Day's character in the episode
    "Sweet Dee Has a Heart Attack" from the show It's Always Sunny
    in Philadelphia
    :

    "Every day Pepe's mail's getting sent back to me. Pepe Silvia, Pepe
    Silvia, I look in the mail, this whole box is Pepe Silvia!"

    You, citizen of the distant future, will not have recognized this
    parody, but at least now you can look it up.

    Forth is complex when taken as a whole. But it is made of tiny
    pieces, each of which is very simple. The concept was created
    over a period of years on very constrained systems.
    Each part created only as needed.

    I'll repeat your question from before so you don't have to:


    "Hey, wait! But if Chuck hates complexity so
    much, why did he use such a complex method for Forth?"

    This is where the historical context is, once again, very revealing:

    As we've seen, Charles H. Moore did not create Forth all at once in a
    single lightning bolt of inspiration.
    It began as a simple command interpreter and executor and grew
    from there.
    It has always consisted of tiny little parts, working together.

    Each of these tiny parts is extremely simple on its own.

    And each was added over a period of time as the need arose.

    I think that's the genius of Forth: That all of these little
    pieces can work together to make a running system and yet still
    remain independent
    .
    You can learn each of these in isolation. You can replace them
    in isolation.

    Ultimate flexibility and simplicity at the lowest level of
    the implementation comes at the cost of easy understanding at
    higher levels.

    When growing a system like this, most of us would have thought
    bigger, Moore thought smaller.

    Let's do the same.
    I've thrown the terms "code word" and "colon word" around a lot.
    I've explained them a bit, but we've never given a proper introduction.

    Let's go small:

    Code words

    Again, Code words are primitives written in machine language
    supplied by the Forth implementation.

    Let's see some real code words so we can de-mystify them
    once and for all. These are extremely simple
    and extremely concrete examples of actual NASM assembly language source
    from my nasmjf port of JonesForth:

    Small and simple:

    DEFCODE "SWAP",SWAP,0
        pop eax
        pop ebx
        push eax
        push ebx
    NEXT
        

    Is that really SWAP? Yes, it really is! We're just telling the
    CPU to pop the two most recent values from the stack and then push them
    back in the opposite order.

    (JonesForth uses the i386 call/return stack as a Forth parameter
    stack so we can use the native "pop" and "push" to make these
    operations easy. In exchange, we lose the ability to use "call"
    and "ret" for subroutines.)

    The DEFCODE macro is housekeeping - it creates the
    entry's header in the Forth word dictionary.

    Notice the NEXT macro we talked about previously?
    Remember, that's just another two lines of assembly pasted at the
    end of this routine.

    Even Smaller:

    DEFCODE "DUP",DUP,0
        mov eax, [esp]
        push eax
    NEXT
        

    We're down to just two instructions now! We move the value pointed
    at by the esp register into eax and then push it onto the
    stack.

    To understand why this duplicates the top item on
    the stack, you need to know how the esp register is used.
    Here's the relevant comment from the JonesForth source:

    "In this FORTH, we are using the normal stack pointer (%esp) for the
    parameter stack. We will use the i386's "other" stack pointer (%ebp,
    usually called the "frame pointer") for our return stack."

    Which means that esp points to the current top of
    the parameter stack. So pushing that value on the stack duplicates
    the top value. (This could also have been written more clearly with
    three instructions: one "pop" and two "push"es.)

    The Smallest:

    DEFCODE "DROP",DROP,0
        pop eax
    NEXT
        

    Now we have an entire Forth word defined as a single
    instruction! DROP just "removes" the top value from the stack. In this
    case, we pop it into the eax register and then don't do
    anything with it, essentially throwing it away. (Alternatively, we
    could have decremented the esp register, but in this case,
    the "pop" is both shorter and clearer.)

    Now let's see these three words in action in a real
    Forth program that moves some real numbers around
    in memory...

    Code words in action

    8 7      8 7
    SWAP     7 8
    DROP     7
    DUP      7 7
        

    The code word primitives we've just defined are used by the
    rest of the Forth implementation to define colon words in the
    language itself. If you write Forth applications, your own
    colon words will probably use these heavily.

    You can also call them interactively in the interpreter.

    The above example shows what it might be like to use these
    three primitives right at the keyboard. The column on the right
    shows the state of the parameter stack after each line of input.

    Apart from pushing the two numbers on the stack (8 7)
    , we've now seen the assembly language code for the entire
    program shown above. That makes this pretty "bare metal" stuff, right?

    Here's the walk-through:

    • We start with 8 and then 7 on the top of the stack.
    • SWAP reverses the order of the stack so 8 is now on the top.
    • DROP pops the 8 and throws it away. Now only 7 remains.
    • DUP pushes a second copy of 7 onto the top of the stack.

    Again, these instructions could exist in the definition of a word or
    you could type them interactively in the running Forth interpreter.
    The result is the same.

    I think there's something pretty magical about realizing that
    typing these instructions is running specific machine code
    sequences exactly as they were entered. In this implementation,
    there's no optimizing compiler or virtual machine acting as middle-man.
    You really are communicating directly with the processor.

    nasmjf has 130 code words. Mostly for efficiency.

    If you weren't already wondering, perhaps you are now:
    How many Forth words need to be defined in machine code
    to have a "bootstrappable" Forth system?

    There are some theoretical minimums. But as you get down to an
    absurdly small number of instructions, the Forth code written with the
    primitives (to implement the rest of the language) approaches absurdly
    large amounts of convolutions that test the limits of both programmer
    ergonomics and computational inefficiency.

    Check out this amazing article by Frank Sergeant:
    A 3-INSTRUCTION FORTH FOR EMBEDDED SYSTEMS WORK
    (utoh.org).

    "How many instructions does it take to make a Forth for
    target development work? Does memory grow on trees? Does the cost
    of the development system come out of your own pocket? A 3-
    instruction Forth makes Forth affordable for target systems with
    very limited memory. It can be brought up quickly on strange new
    hardware. You don't have to do without Forth because of memory or
    time limitations. It only takes 66 bytes for the Motorola MC68HC11.
    Full source is provided."

    You read that right: 66 bytes.

    And later:

    "The absolute minimum the target must do, it seems to me,
    is fetch a byte, store a byte, and call a subroutine. Everything
    else can be done in high-level Forth on the host."

    Which reminds me, did you know there is such a thing as a
    one-instruction set computer
    (wikipedia.org)?
    And of course you can run Forth on them:
    16-bit SUBLEQ eForth
    (github.com).

    But that's nuts.

    How about something a little more realistic?

    sectorforth has 10 code words.

    Cesar Blum's
    sectorforth
    (github.com)
    is:

    "...a 16-bit x86 Forth that fits in a 512-byte
    boot sector. Inspiration to write sectorforth came from a
    1996 Usenet thread."

    See? There's Usenet again. It wasn't just me reading all that lore.

    The author's
    posting of the project to the Forth sub-reddit
    (reddit.com)
    has additional insight:

    "I've always been fascinated by the idea of having a
    minimal kernel of primitives from which "everything" can be built.
    Before Forth, I had only seen that in the form of Lisp's "Maxwell
    equations of software", which is cool, but always left me a little
    disappointed because it is too abstract to build something that you
    can actually interact with - you can't break out of its esoteric
    nature...

    With Forth, however, you can start from almost nothing, and start
    adding things like ifs, loops, strings, etc., things that look more
    like your day-to-day programming. I find that there's a lot of
    beauty in that."

    Note: The statement about Maxwell's equations surely refers to
    Alan Kay's famous quote about LISP from
    A Conversation with Alan Kay
    (acm.org):

    "Yes, that was the big revelation to me when I was in graduate
    school - when I finally understood that the half page of code on
    the bottom of page 13 of the Lisp 1.5 manual was Lisp in itself.
    These were "Maxwell's Equations of Software!" This is the whole
    world of programming in a few lines that I can put my hand over."

    Okay, so we've talked about code words
    that are just chunks of machine code that can be called upon
    at any time.

    Now let's see what colon words are all about...

    Colon words are made of Forth!

    Let's make one:

    : SDD SWAP DROP DUP ;
        

    A colon word is so-named because its definition begins with the
    ":" character.

    The example colon word definition above creates a new word called
    SDD that is a composition of the three code words we
    defined earlier: SWAP, DROP, and
    DUP.
    Perhaps the word "composition" brings to mind the concatenative
    terminology we explored earlier in this quest?

    As this example demonstrates, colon words are defined entirely
    by other words, which may be code words or other colon words.
    You can also have numeric values, e.g. 8 and 7, which
    are handled by the interpreter.

    (You can also have strings, which looks like data...but those are
    just input that happens to follow one of the special words, e.g.
    ." (dot quote), that knows how to handle the input!)

    Let's see it in action:

    8 7      8 7
    SDD      7 7
        

    The effect of calling our new SDD word is, of course,
    identical to calling the three separate words SWAP,
    DROP, and DUP in sequence.

    In indirect threaded code terms,
    this colon word has been "compiled" into the addresses of
    the "inner interpreters" for each of the three code words.
    But feel free to ignore this detail!

    Let's demystify this further because the Forth "compiler" is
    probably much, much simpler than you'd think:

    How ":" works

    Here's what really happens when we enter this:

    : SDD SWAP DROP DUP ;
        

    Colon (:) fetches the word name (SDD) and sets "compile mode".

    Semicolon (;) completes the word's entry in the dictionary and unsets "compile mode".

    It might still be surprising that ":" is a Forth word.

    It looks like the sort of thing we would call "syntax" in other
    programming languages, but it really isn't. It's a word.

    You can even replace ":" with your own definition
    to extend or alter Forth to do your bidding!

    It may be hard to fully grasp for a while, but
    Forth's only
    syntax is the whitespace between tokens of input.

    Tokens are tokenized by a word called "WORD", which is an
    incredibly confusing overload of the term. Sorry.

    (You'll also notice I've mentioned the term "dictionary" a couple
    times now. It's kind of obvious that a dictionary can hold words, but
    I haven't properly explained the Forth dictionary yet. Don't worry,
    we're almost there.)

    Okay, so ":" switches the "outer interpreter" into
    compile mode and ; switches it back. But what does
    that mean?

    "Compiling" in Forth means putting one of two things into memory:

    • The address of a word, or
    • A value literal and a bit of code that pushes it on the stack

    At its simplest, compiling is just like executing, but we're storing
    addresses instead of jumping to them.

    Actually, that's understating the elegance and simplicity of how this
    works, which is one of the most mind-blowing things in Forth.

    Forth uses the same interpreter to both compile
    and execute code!

    In a traditional Forth, the interpreter executes words as you
    enter them. Unless you're in "compile mode", then it is
    compiling those words as addresses into memory on the fly
    as you enter them.

    It's straight from the keyboard to memory.

    To make this concrete, let's step through the example.

    Here's our definition again:

    : SDD SWAP DROP DUP ;
            

    In "normal mode", the interpreter is executing everything as we enter it.

    When the interpreter encouters the ":" word, we're
    still in "normal mode", so it looks ":" up in the
    dictionary, finds it, and executes the word. The definiton of
    ":" will collect the name "SDD" and turn on the "compile
    mode" switch.

    Now when the interpreter hits the "SWAP" word, it will
    look up its address in the dictionary as usual, find it, and
    store the address in the next available memory slot where we
    compile new words (a very important built-in variable called
    "HERE" keeps track of this memory position).

    The same thing happens for "DROP" and "DUP".
    We're compiling as fast as we can type!

    Then a bunch of really interesting things happen when the interpreter
    gets to ";" (SEMICOLON).

    First, ";" is looked up and found in the dictionary and
    then...Hey, wait!
    Why isn't the address of the ";" word
    also compiled into our new definition? That's a
    great question!

    Time for another trick. One of the flags stored in a word's
    dictionary entry is the "immediate" flag. When this flag is turned on,
    the word is always executed immediately
    even in compile mode.
    The ";" word is an immediate word, so it executes instead
    of being compiled.

    (Ready to have your head turned inside-out? There are also
    tricks for compiling immediate words into word definitions!
    It's simple enough, but still pretty mind-bending stuff when you first
    encounter it.)

    The definition of ";" turns off compile mode. Then it
    does some housekeeping to complete the entry of the new
    SDD word in the dictionary.

    As soon as ";" returns control to the outer
    interpreter, we're now sitting in normal mode again and our new
    SDD word is available to be called directly or compiled
    into other words.

    See what I mean? It's all made of these tiny little parts.

    Each part is incredibly simple, but trying to explain how the
    parts fit together takes paragraphs of text.

    Speaking of simple...

    Almost no syntax = simple interpreter and extreme extensibility

    The tiny set of rules that govern the interpreter:

    • WORD gets a token.
    • Is it in the dictionary? (And are we compiling?)
    • Is it a numeric literal? (And are we compiling?)
    • Otherwise, error!

    Let's look at our example code again. The first line
    runs, the second line compiles:

    8 7 SWAP DUP +
    
    : SDP SWAP DUP + ; 8 7 SDP
        

    It would be annoyingly redundant to walk through the two lines of
    Forth above step-by-step because they are nearly identical. The only
    difference is that the first line simply executes each word as it is
    encountered (SWAP, DUP, +). The second line compiles those three words
    into a new word called SDP (for "Swap Dup Plus"). The result of both
    lines is the same. (7 and 16 on the stack).

    Only the numbers (8 and 7) and the spaces separating words have
    any special meaning to Forth's "outer" interpreter.
    Everything else is looked up in the dictionary.

    Ah, but did you notice the order of the bullet points above?
    We check to see if a token is in the dictionary before
    we check to see if it is a numeric literal. Yes, even numbers are
    looked up in the dictionary first! Does that perhaps give you any ideas
    about that magic trick I promised at the start of this article?
    Don't worry, the trick is forthcoming.

    Furthermore, input is not returned to the main Forth "outer"
    interpreter until a dictionary word completes executing. So there is
    absolutely no limit to the types of
    domain-specific language
    (wikipedia.org)
    you can create.

    And if that weren't enough, You can also replace every single piece
    of the Forth interpreter itself. Remember, they're all independent little
    cogs in the machine. Forth is the ultimate freedom.

    I've alluded to this in several different ways above, but I'll make
    a bold claim:
    Forth has the simplest syntax and therefore the simplest
    parser, interpreter, and compiler ever used in a "mainstream"
    general-purpose programming language.

    Two other languages previously mentioned, Lisp and Tcl, are also
    famously syntactically minimalistic languages. People have
    written incredibly tiny implementations of each:

    • Lisp: sectorlisp, a 512-byte implementation of LISP (github.com/jart)
    • Tcl: picol, a Tcl interpreter in 550 lines of C code (antirez.com)

    Mind you, both of these people (Justine "jart" Tunney and Salvatore
    "antirez" Sanfilippo) are incredible programmers, but these examples
    hint at what is possible.

    But Forth surely takes the cake. Even a certified non-genius
    like myself can write an entire Forth interpreter in a
    couple hundred assembly instructions. (See "Meow5" below.)

    Because of its extreme simplicity, tokenizing Forth can be done in
    a mere handful of assembly instructions on many processors.

    And as mentioned, once you've written a Forth interpreter, you're
    well on your way to a working Forth compiler.

    I've alluded to Forth's flexibility and extensibility on several
    different occasions now. But this is no mere party trick. Forth
    relies on the fact that you can do anything in Forth.

    In the next example, we'll see how Forth implements control structures.

    The definition of IF...THEN from jonesforth.f:

    : IF IMMEDIATE ' 0BRANCH , HERE @ 0 , ;
    
    : THEN IMMEDIATE DUP HERE @ SWAP - SWAP ! ;
        

    This right here is one of the most mind-blowing things about Forth,
    and a solid reason to title this, "The programming language that writes
    itself."

    Even something as fundamental as IF is defined in
    the language! Forth is not the only language that can do this, but
    few languages invite the programmer to participate so thoroughly
    in the inner workings as often or as joyfully as Forth.

    Figuring out how the IF and THEN definitions above actually
    work is left as an exercise for the reader, but here's a brief
    explanation of the new words they use:

    '       - gets the address of the word that follows, put on stack
    0BRANCH - branch to the next value if the top of the stack has 0
    ,       - 'compile' the current stack value to the memory at HERE
    @       - fetch value from address on stack, put value on stack
    !       - store to memory (stack contains address, then value)
            

    (By the way, I'll go on the record to say this: The
    early parts of bootstrapping Forth in Forth (at least
    the top 25% of jonesforth.f) is significantly more
    mind-bending than implementing the low-level code word definitions
    written in assembly language. In fact, any time I needed to return to
    the assembly, it was like a comforting blanket of simplicity compared
    to the logic puzzle of those Forth-in-Forth primitives!)

    But, even seeing control structures like IF..THEN
    implemented in the language may not have prepared you for seeing this
    next trick.

    This should drive home the fact that Forth has almost no native
    syntax:

    The definition of ( ) nested comments from jonesforth.f:

    : ( IMMEDIATE
        1
        BEGIN
            KEY DUP '(' = IF DROP 1+
            ELSE ')' = IF 1- THEN
            THEN
        DUP 0= UNTIL
        DROP
    ;
    
    (
        From now on we can use ( ... ) for comments.
    ...
        

    Yeah, you read that right. Even comments are implemented
    in the language! And you can re-define them or add your own kind of
    comments!

    Some of you are soiling yourselves in excitement right now.
    Some of you are soiling yourselves in fear.
    We're all just sitting here in our own filth now.

    And now, at last, we are ready to discuss the power of the Forth
    dictionary.

    The Dictionary

    A Forth dictionary traditionally uses a linked list.

    Word matching is done starting from the end
    (most recent entries) first, so:

    • You can redefine any word, even the ones originally
      defined in assembly!
    • Words depending on previous definitions of redefined words
      won't break because the compiled addresses still point to
      the original word, not the new definition!
    • You are in complete control!
    • Again, Forth = freedom!

    It's not just minimalistic syntax. Arguably, the
    real reason Forth is so extensible is because of
    the dictionary.

    As mentioned in the points above, more recent word definitions
    override older ones with the same name - the interpreter stops at the
    first match.

    But as mentioned above, existing compiled words that use the
    old definitions are not affected because
    name of the old word, they've stored the address.
    The address of the old word still points to the old word.

    You don't have to strictly replace. You can extend
    words by calling the original word from a new one with the same name!

    You are perhaps wondering what happens if
    you attempt to make a recursive word. By
    default, ':' (COLON) marks the word currently being compiled into the
    dictionary as hidden or disabled so that previous definitions can be
    called, as mentioned.
    This is why we have a word called RECURSE which inserts a
    call to the current word within itself. Because all information
    in Forth is global (including the address of the current word being
    compiled, defining RECURSE is incredibly simple (just four words in the
    JonesForth definition).

    Besides making new control structures or other types of extensions
    to the language, what else can we do with these abilities?

    It's not just the language itself that is unusually malleable.
    Your program written in Forth can be flexible too.

    Here is an example lifted and paraphrased from Thinking Forth
    by Leo Brodie.

    Say we create a variable to hold a number of apples:

    VARIABLE APPLES
    20 APPLES !
    APPLES ? 20
    	

    Forth variables put addresses on the stack.

    Note: I have a physical copy of Thinking Forth because
    I think it's great. But the publishers have kindly made it available
    for free online:
    Thinking Forth (PDF)
    (forth.com)

    Let's walk through the three lines above. Here's the first line:

    VARIABLE APPLES
            

    The VARIABLE word creates a new spot in free memory. Then
    it creates a new word in the dictionary called APPLES that pushes that
    particular memory address on the stack when it is called.

    (Note that like ":", "VARIABLE" is grabbing the next token of input
    for use as a new dictionary name. This is possible because "the little
    cogs in the Forth machine" are available for any use you can think of.
    And one of those cogs is the word WORD, which gets the next token from
    the input stream. Both ":" and "VARIABLE" use WORD to do this, just like
    Forth's own outer interpreter!)

    Okay, so we have a variable named APPLES now. The next line is:

    20 APPLES !
            

    This puts the value 20 on the stack, then the address for APPLES.
    The "!" (STORE) word stores the value 20 at the APPLES address.
    (In other words, "!" takes two values as input: an address and
    a value. It stores the value at that address.)

    Conceptually, you can think of the above as APPLES = 20
    in "normal" programming syntax.

    And now the third line:

    APPLES ?
            

    This line prints the value stored at APPLES. The word "?" fetches a
    numeric value from an address and prints it (which pops the value off
    the stack again). Again, APPLES puts its address on the stack. So "?"
    simply takes an address from the stack as input for printing.

    By the way, here's the entire definition of "?" in JonesForth:

    : ? @ . ;

    Look at how small that is! The only thing you need to know to
    understand this definition is that "@" (FETCH) pops an address from the
    stack and fetches the value stored at that address and puts the value
    on the stack. "." (DOT) pops a value from the stack and prints it as a
    number.

    Okay, on with our example.

    We're about to be dealt a terrible blow...

    We pepper our program with this APPLES variable.

    The application works perfectly for a couple years.

    Then we are told that we must now keep track of two different
    kinds of apples: red and green. What to do?

    Unfortunately, this is exactly the sort of conundrum we see in real
    life software all the time.

    You knowingly prepared for all sorts of different quantities
    of apples, but it never occurred to anyone that we would need to
    track different types of apples.

    This problem seems very bad. Do we have to completely re-write our
    application?

    (Well, outside of this example, the correct answer might be
    "yes". Maybe this changes the whole "theory" of the program, in the
    Programming as Theory Building
    (ratfactor.com)
    sense. In which case, a re-write or big refactor of our apple counting
    program is likely the right answer. But for this example, we're
    assuming that we have thousands of lines of
    apple-handling functionality that will not need to
    change. We'll say that grouping the apples by color here is just an
    essential surface detail.)

    All right, obviously we can't store two values in one
    variable and expect all of the existing code to still work. So what
    could we possibly do?

    Here's a very clever and very Forth solution:

    A new variable will store the current type of apples.

    VARIABLE COLOR
    	

    As with "APPLES" above, VARIABLE creates a memory space and a new
    word called "COLOR" that puts the address of the memory space on the
    stack when it is called.

    Next, we'll create a second new variable and a new colon word.

    "REDS" will count red apples.
    Colon word "RED" sets the current type of apple to red:
    COLOR = REDS:

    VARIABLE REDS
    : RED REDS COLOR ! ;
    	

    Remember, variables are also words in the dictionary, so we've
    created three additional words so far: COLOR, REDS, and RED.

    (Only one of these, RED, is recognizably a function.
    But really all three of them are.)

    As you may recall from earlier, "!" (STORE) takes two parameters,
    a value and an address, and stores the value at that address.

    • COLOR is the address of memory holding the address of the current apple count variable
    • REDS is the address of memory holding the red apple count
    • RED sets COLOR to the address of REDS

    It might be helpful to see the C equivalent of the RED word:

    void RED(){
        COLOR = &REDS
    }
            

    And then...

    Same for green.

    VARIABLE GREENS
    : GREEN GREENS COLOR ! ;
    	

    We've added a total of five new words. The two new green ones
    are identical to the red ones above:

    • GREENS is the address of memory holding the green apple count
    • GREEN sets COLOR to the address of GREENS

    Here's the C equivalent of GREEN:

    void GREEN(){
        COLOR = &GREENS
    }
            

    One more change...

    Lastly, we change "APPLES" from a variable to a word that gets
    the current count by color:

    : APPLES COLOR @ ;
    	

    As you may recall from earlier, "@" (FETCH) fetches the value
    stored in a variable and puts it on the stack.

    So "APPLES" gets the value stored in COLOR and puts that
    on the stack.

    The value stored in COLOR happens to be an address.
    That address happens to be the memory pointed to by either REDS or
    GREENS.

    It would look like this C code:

    int *APPLES(){
        return COLOR;
    }
            

    This "get the address of the address" stuff may sound super
    confusing. But working with memory addresses (aka "pointers") is
    how variables work in Forth, so to the adept Forth programmer,
    the idea of passing addresses around will be deeply ingrained and
    no big deal.

    Okay, so we've got red and green apple counts. That much
    is clear. But surely there is still a lot of work ahead of us...

    Now we have to re-write any use of APPLES, right?

    Wrong! The use of APPLES is identical. The syntax hasn't
    changed one bit for any existing code. We just need to make sure we've
    set the right color.

    Check it out:

    20 RED APPLES !
    30 GREEN APPLES !
    
    GREEN APPLES ? 30
    APPLES ? 30
    
    RED
    APPLES ? 20
    	

    All of the existing code that uses APPLES will still work
    exactly the same way with absolutely no modifications.

    Furthermore, look at how English-like it reads to store
    "20 RED APPLES !" or query "GREEN APPLES ?".

    The key to understanding why this works is to remember that
    "APPLES" was already a word that put an address on the stack
    because that's how variables work in Forth.
    So when we changed it to a colon word that puts an address on the
    stack, it's no change at all. It's still doing the exact same thing.
    It just happens that the address will change depending on the active
    apple color.

    At every single opportunity, Forth has taken the simplest
    (you might even say, laziest) and most flexible method
    for implementing a feature.

    Wait, I hear a distant screaming:

    "How could this possibly be okay?! You call this 'freedom', but
    I call it unchecked chaos material! This is not okay!"

    Well, maybe.

    But I think one reason this actually is okay, on a
    conceptual level, is that APPLES did not really change
    what it originally did.

    Coming from the normal programming language world, we have clearly
    broken the abstraction:
    "APPLES" was a variable before, and now it's a function.

    But you're not in the normal programming world anymore.
    Here, in Forth-land, a variable is a word that puts an
    address on the stack. And a function is also just a word.

    "APPLES" is still a word that puts an
    address on the stack. There is no conceptual change at the
    language level. We did not break an abstraction because there
    was no abstraction to break.

    Forth provides what you might call "atomic units of computing"
    at the language level. It is a language where you make the
    abstractions.

    To Forth, it's all just words in a dictionary.
    "VARIABLE" is just another word
    you could have written yourself.

    Do you see now why Chuck Moore rejects the standardization
    of Forth? It ossifies concepts like VARIABLE so they lose their
    flexibility.

    The example above is also another demonstration of the way
    the language Forth "writes itself": a tiny handful of primitives can be
    used to bootstrap the rest of the language in the language itself. The
    enormous flexibility of the primitives allows nearly unbounded freedom.

    Implement a Forth to understand how it works

    I highly recommend implementing Forth (or porting it like I did) to understand
    how it works "under the hood."

    By examining Forth from the ground floor at the assembly language level,
    I gained considerable confidence in my understanding of how all the moving
    parts fit together.

    To be honest, it's difficult for me to imagine being to able to understand all the
    individual parts without going through this process. But everybody learns
    differently.

    But be aware of what this will not teach you

    Implementing an interpreter teaches you almost nothing about how
    to write programs with that interpreter.

    Knowing how a Forth system works is almost completely unrelated
    to knowing how to write programs in Forth.

    You can know the spec for a language by heart, but still be clueless
    about writing good software in that language. It's like expecting a
    mastery of English grammar to make you a good novelist. They're entirely
    different skills.

    Be also aware that most people on the Internet (including myself) are
    still complete newbies to actually creating software with Forth!

    Or invent Forth for yourself

    "I didn't create Forth, I discovered it."

    -- Chuck, apocryphally

    (I have been unable to find a source for the quote above.
    It probably comes from an interview.)

    If Forth truly is a fundamental way to express computation, then
    it's sort of like
    Gödel and Herbrand's general recursive functions, Church's lambda
    calculus, Turing's theoretical machines, Post's canonical systems, and
    Schönfinkel and Curry's combinators.
    (I can hear furious objections warming up from a thousand armchairs...)

    In fact, that's true of all programming languages, even the
    big, messy ones. Right? Any language that can express universal
    computation is...universally powerful; it can express anything
    that is computable.

    But I think Forth belongs to a more rarified group. Forth is a
    fundamental type of programming language design.
    And I'm not alone in thinking so. For example, check out
    The seven programming ur-languages
    (madhadron.com).

    I'll let philosophers angrily split hairs over what I just said above,
    but I think the principle is true. And it's true all the way down
    to the (lack of) syntax in the language.

    Why do I believe this? Well...

    Making nasmjf gave me so many ideas, I had to try some
    experiments.

    Forth is an amazing playground for ideas.

    I was still keenly aware that my nasmjf project to
    port JonesForth to NASM was still just a (very detailed) examination of
    a final artifact. I was not re-tracing Moore's footsteps, but
    imitating his work. In fine art terms, I made a "master copy" (training myself by
    copying the work of a master artist). In other words, I brought
    my sketchbook to the museum.

    But what would happen if I tried making a painting of my very own?

    Meow5

    An exercise in extreme concatenative programming where
    all code is concatenated (always inlined).

    We explored what it means to be a "concatenative" programming language
    at the beginning of my journey above. In short, in a concatenative
    language, data implicitly flows from one function to another like a
    factory assembly line.

    Like Forth, Meow5 happens to be concatenative because it uses
    the same "parameter stack" concept.

    Unlike Forth or most other sane languages, Meow5 is a thought
    experiment taken too far
    . Specifically, the thought,
    "instead of threading function calls by storing their addresses, what
    if we just store a copy of the whole function?

    In compiler parlance, this is "inlining", short for
    inline expansion
    (wikipedia.org).
    It is a common optimization technique
    for avoiding the overhead of a function call for small functions.

    Let's use the word DROP for example. Remember when we looked at the
    assembly language source of the DROP code word? It was just a single
    assembly instruction:

    pop eax
            

    It would be incredibly silly to have several jumps to and from
    a single-instruction word!

    (And, it comes as no surprise that
    "real" Forth implementations often inline small primitives such
    as DROP. Some even provide an INLINE word to allow the programmer
    to specify this explicitly.)

    My question was: What if we do that for everything?
    At what point is this no longer a good idea?
    Obviously at some point, a function is too large to inline.
    But every code word in JonesForth
    was quite tiny by modern standards. With today's CPUs and their
    relatively enormous caches it seemed to me that you could take
    this inlining concept pretty far before it got ridiculous.

    And wouldn't the CPU just love seeing all of those instructions
    executing in one straight and continuous sequence with no jumps?
    If I were a CPU, I would love it.

    Plus, it would make compiling a stand-alone executable almost
    trivial because every word in a 100% inlined language
    would contain all of the machine code needed for that
    word.

    Here is the canonical example:

    : meow "Meow." print ;
    meow
    Meow.
    
    : meow5 meow meow meow meow meow ;
    meow5
    Meow.Meow.Meow.Meow.Meow.
    	

    The idea is that meow5 compiles into five complete
    copies of meow!

    This example seems to be obviously naughty and wasteful. But I'm
    not a superscalar, out-of-order executing modern processor and neither
    are you. So the question remains: At what point does having a
    child function which includes a complete copy of every parent and
    grandparent and every ancestor function all the way back to the
    beginning spiral out of all sane proportions? Well, you could spend an
    afternoon figuring it out on paper, or you could be like me and spend
    the better part of a year writing an assembly program.

    Spoiler alert: I consider Meow5 to be a
    delightful little failure. The problem isn't inlining machine code -
    that works great, and, indeed, the exported ELF executables from Meow5
    work exactly as I imagined. The problem is data, and most
    conspicuously, data in the form of strings. Let's take the
    meow word for example: You either have to copy the string
    "Meow." five times, once for each word that uses it, or go
    through some complicated hoops to track which word uses the string. And
    you have to do that two different ways: Its location in memory in the
    live interpreter and in it's destination in the stand-alone ELF memory
    layout. Either way, the purity and simplicity is lost, which was the
    whole point of the experiment. Also, it will come as no surprise that I
    later discovered that Forth implementations often have an INLINE word
    (as I mentioned above), which is a much better way to selectively
    instruct the compiler about which words you wish to copy entirely. As a
    program, Meow5 is a failure. But as a project, it is a success
    because I learned a lot.

    Think of it as an art project.

    Anyway, the point is...

    Despite attempting to go my own way,
    it's remarkable how many times Forth's solution was the
    path of least resistance.

    Again and again I would say, "Aha! That's why."

    First of all, you'll notice I ended up using ":" and ";" to
    define new functions.
    Forth makes liberal use of symbols and abbreviations, which
    can make it pretty hard to read. But I have to admit, ": ... ;"
    has grown on me. So I adopted that in Meow5. So that's probably
    the most visible thing. But that's just on the surface.

    Secondly, using a postfix notation is absolutely the path
    of least resistance for a stack-based language - everything comes in
    the order expected by the language. So your interpreter can be
    shockingly simple because it can execute statements in the exact order
    it gets them.

    (Side note: This is also how the
    PostScript
    (wikipedia.org)
    printer and display language works. The printer can begin printing as
    soon as it recieves the document because everything is defined in the
    order it is needed and never depends on later information. This can
    also be a disadvantage of PostScript for viewing documents on
    screens: You can't just render a page mid-document because
    styling and formatting controls must be read in their entirety from the
    start of the document to the current page in order to ensure you've
    got everything!)

    I was determined to make things easy for myself,
    so I can say with some certainty that Forth is one of the
    most "minimum effort" languages you can imagine.
    If I could have thought of an easier (or lazier) way to do something,
    I would have done it!

    There was just one place I decided to deviate
    from Forth even though I knew it would make implementation harder.

    To make a string in Forth, you use the word ", which
    needs a space after it to be seen as a word, which looks awkward:

    " Hello World."
        

    This has always bothered me. Chuck Moore even admits this in
    his unpublished book,
    Programming A Problem-Oriented Language (PDF)
    (forth.org)
    in the section titled 6.3 Character strings:

    "What does a character string look like? Of all the ways you might
    choose, one is completely natural:

        "ABCDEF...XYZ"
                

    A character string is enclosed in quotes. It can contain any character
    except a quote, specifically including spaces."

    Right! So by golly, that's what I would do in Meow5, like
    every sensible language!

    Meow5 has this more natural quoting style:

    "Hello World."
        

    But the effects are cascading. And they limit flexibility.

    If we keep reading Chuck's words, he explains what will happen
    if you do this:

    "We get in trouble immediately! How do you recognize a character
    string? By the leading quote, of course. But do you modify your word
    subroutine to recognize that quote? If you do so you may never use a
    leading quote for any other purpose. Much better that the quote is a
    word by itself, treated like any other dictionary entry, for it can then
    be re-defined. But words are terminated by spaces, and I still resist
    making quote an exception. So let's type character strings:

        " ABCDEF . . . XYZ"
                

    And he was right, of course.

    I ended up having to put exceptions for the " character in
    multiple places in the Meow5 interpreter, including my
    get_token function, which serves the same purpose as
    the "WORD subroutine" Moore mentioned above.

    And now all additional interpreter features have to work
    around or duplicate the special " character handling!

    It seems one can either follow Moore's advice or re-discover
    it for oneself. As for me, I always enjoy re-discovering things for
    myself. The best part is that "aha!" moment when I realize why
    things are the way they are.

    Though, to flip this whole thing on its head, I actually think it
    was worth the extra effort, trouble, and loss of purity to do
    this! (I also included escape sequences, e.g. n and
    ", while I was at it.)

    Another example of straying from Moore's advice
    and having to discover it for myself:

    I decided to have some of my functions leave the stack alone after using
    the top value.

    Some functions are mostly used to examine a value, but they pop
    that value off the stack. To keep working with the value, you have
    to do a DUP to duplicate it first.

    Since I was sure I would always want to keep the value after these
    particular functions, it seemed very wasteful to have to do a DUP each
    time. Why not just peek at it and leave it on the stack?

    Moore recommends just popping everything so you
    don't have to remember.

    But I thought that was silly. So I went ahead and made some functions
    that just peek at the value and leave it on the stack.

    But as you may have guessed, he was absolutely right.

    Having some words pop the stack and some words peek was a nightmare.
    I kept forgetting which words did or didn't alter the stack and it
    kept causing problems. I completely regretted it and ended up
    making them all pop like Moore advised.

    (Another option that occurred to me after I changed them all would
    have been to have a special naming scheme for non-popping words, which
    probably would have been fine, expect then I would have had to remember
    the name... so hassle either way.)

    Now we have yet another reason for the title of this
    article.

    Once you start down the Forth path... the rest just sort of
    "writes itself".
    Chuck Moore already found the path of least resistance.

    To sum up the ways in which "Forth writes itself" so far, we have:

    • Forth is boostrapping
    • Forth is metaprogramming
    • Forth can be your OS and your IDE/editor
    • Forth is the path of least resistance for writing a Forth

    If you set out to make the simplest possible interpreter
    for a brand new CPU architecture, you might end up writing
    a Forth whether you want to or not.

    Forth lets you define more Forth in Forth so you
    can Forth while you Forth. And the Forth editor is Forth
    and can be extended with Forth, so can Forth Forth in Forth Forth Forth
    Forth. (I'll let you figure out which of those are nouns, adjectives,
    or verbs and whether or not I have the right number of them.)

    And if that weren't enough, Forths often contain assemblers
    so you can define additional code words in Forth, too so you never
    need to leave Forth once you're in it.

    JonesForth has the stub of an in-Forth assembler near the end so we
    can see how one might work. Here's the comment introducing it:

    (
        ASSEMBLER CODE --------------------------------------------
    
        This is just the outline of a simple assembler, allowing
        you to write FORTH primitives in assembly language.
    
        Assembly primitives begin ': NAME' in the normal way,
        but are ended with ;CODE.  ;CODE updates the header so that
        the codeword isn't DOCOL, but points instead to the
        assembled code (in the DFA part of the word).
    
        We provide a convenience macro NEXT (you guessed what it
        does).  However you don't need to use it because ;CODE will
        put a NEXT at the end of your word.
    
        The rest consists of some immediate words which expand
        into machine code appended to the definition of the word.
        Only a very tiny part of the i386 assembly space is covered,
        just enough to write a few assembler primitives below.
    )
            

    Just try not to go insane from the unlimited power.

    And then there's this:

    PlanckForth

    Hand-written 1Kb binary

    This image comes from the
    PlanckForth repo
    (github.com).
    It's one of the most
    beautiful pieces of code I've ever seen. It's a complete ELF binary
    with a working Forth implementation that fits in less than 1Kb.
    As you can see, there's enough room left over for a description and
    copyright at the end.

    The binary is stored as an ASCII hex represention that can be turned
    into a working binary using xxd -r -c 8.

    But the best part is bootstrap.fs, written in
    line-noise-like operators and gradually becoming readable Forth
    after a couple hundred lines.

    Thankfully, comments are one of the very first things implemented
    and it's almost like seeing bacteria spell out words in a petri dish:

    h@l@h@!h@C+h!k1k0-h@$k:k0-h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!
    h@l@h@!h@C+h!k1k0-h@$k h@k1k0-+$h@C+h!ih@!h@C+h!kefh@!h@C+h!l!
    
    h@l@ h@!h@C+h! k1k0-h@$ kh@k1k0-+$ h@C+h!
        i       h@!h@C+h!
        kkf     h@!h@C+h!
        kLf     h@!h@C+h!
        k:k0-   h@!h@C+h!
        k=f     h@!h@C+h!
        kJf     h@!h@C+h!
        k0k5-C* h@!h@C+h!
        kef     h@!h@C+h!
    l!
    
     **Now we can use single-line comments!**
    
     planckforth -
     Copyright (C) 2021 nineties
    ...
            

    Incredible.

    Another hand-written machine code Forth (in 1,000 bytes and with
    a Forth system in 1,000 lines!) is
    SmithForth
    (neocities.org)
    by David Smith.
    You can see and hear Smith walk through SmithForth on YouTube:
    SmithForth workings
    (youtube.com).

    And as you may recall from earlier, Cesar Blum's
    sectorforth
    (github.com)
    is a mere 512 bytes!

    There are almost as many Forth implementations as there are
    stars in the night sky.

    Forth is an idea that has taken form in countless applications.

    Many Forths are custom and home-grown.

    But it has had great success in a huge variety of roles:

    • Power plants, robotics, missile tracking systems, industrial automation.
    • Embedded language in video games.
    • Databases, accounting, word processors, graphics, and computation
      systems. (You might say, "legacy software." But I say, "Elegant
      weapons for a more civilized age," to paraphrase a certain wise
      Jedi.)
    • In the modern Open Firmware boot loader.
    • Processors of all shapes and sizes.
    • Microcontrollers of all shapes and sizes.

    If it goes "beep" and "boop", someone has written a Forth for it!

    For some notable uses, here are some starting points:

    • Featured Forth Applications (forth.com)
    • Forth Success Stories (forth.org)
    • Forth (wikipedia.org)

    I think
    Open Firmware
    (wikipedia.org)
    is particularly interesting. It came, like many things, from
    the fine engineering minds at Sun Microsystems.

    "Being based upon an interactive programming language, Open
    Firmware can be used to efficiently test and bring up new hardware.
    It allows drivers to be written and tested interactively."

    Perhaps one of the most exciting uses of Open Firmware was the
    Space Shuttle
    ESN, which ran on a radiation-hardened
    UT69R000
    (cpushack.com)
    processor!
    A paper on the ESN,
    Developing plug-and-play spacecraft systems: NASA Goddard Space Flight Center's (GSFC) Essential Services Node (ESN) (PDF)
    (zenodo.org)
    notes that:

    "Open Firmware can debug hardware,software, plug-in drivers, and
    even the firmware itself. Open Firmware provides interactive tools
    for debugging systems."

    By the way, I hope this brief mention of space technology has wet
    your appetite for more, because we're almost there!

    But first, I have a couple more drawings of cool computers you
    should see. Perhaps you are aware of the huge variety of 1980s home
    computers?

    Check these out:

    Jupiter Ace, 1982

    Operating system: Forth.

    OS and library of routines in 8 KB of ROM.

    The onboard Forth was "Ten times faster than [interpreted] BASIC" and
    less than half the memory requirements."

    (The quote above is from Popular Computing Weekly, 1982.)

    The
    Jupiter Ace
    (wikipedia.org)
    was a British home computer of the early 1980s.

    It has a fan-made website, the Jupiter ACE Archive from which
    has the page,
    What is a Jupiter ACE?
    (jupiter-ace.co.uk):

    "The major difference from the 'introductory computer' that was the
    ZX81, however, was that the Jupiter ACE's designers, from the
    outset, intended the machine to be for programmers: the machine
    came with Forth as its default programming language."

    That website has tons of resources. And if you're into that sort of
    thing, you also owe it to yourself to visit the "What is..." page
    linked above and then hover your mouse over the image of the ACE's
    circuit board. Every single IC, capacitor, and resistor is identified
    and explained in little tooltips!

    It's not every day you see a programming language listed as
    an operating system
    for a computer. But you may recall
    that as early as the "IBM 1130 minicomputer at a big textiles
    manufacturer" era, Moore already had an editor and file management
    features. And you can certainly write hardware drivers in Forth if you
    have the right code word primitives. And as we'll see soon, there
    is absolutely no limit to how low-level Forth can go.

    (There's also no limit to how high-level Forth can go. The book
    Thinking Forth by Leo Brodie, the same book from which
    we got the apples example above, is full of examples of applications
    written in very "English like" high-level words.)

    The ACE never sold very many units, but it is prized by collectors
    today. I would take one.

    The
    What is Forth?
    (jupiter-ace.co.uk)
    page has an excellent explanation of Forth in general, but especially
    as an all-encompassing computing system:

    "Classic Forth systems use no operating system. Instead of storing
    code in files, they store it as source-code in disk blocks written
    to physical disk addresses. This is more convenient than it sounds,
    because the numbers come to be familiar. Also, Forth programmers
    come to be intimately familiar with their disks' data structures,
    just by editing the disk. Forth systems use a single word "BLOCK"
    to translate the number of a 1K block of disk space into the
    address of a buffer containing the data. The Forth system
    automatically manages the buffers."

    Many of us fondly remember the boot-to-BASIC computers of the 1980s,
    but can you imagine growing up with the Jupiter ACE in your home and
    actually understanding it?

    The ACE ran on the
    Zilog Z80
    (wikipedia.org)
    CPU, which was incredibly popular at the time for low-power computers
    and has had an amazingly long life. It was used in the higher-end TI
    graphing calculators such as the
    TI-85
    (wikipedia.org)
    I had in high school in 1996, which I spent many a happy afternoon
    programming in TI-BASIC.

    Canon Cat, 1987

    Operating system: Forth.

    OS, office suite, and programming environment in 256 KB of ROM.

    Innovative interface by Jef Raskin.

    Another computer with Forth as an operating system!

    The Canon Cat
    (wikipedia.org)
    is a particularly fascinating machine for a number of different
    reasons, the primary of which is the keyboard-driven interface
    by UI pioneer Jef Raskin.

    Raskin wrote a book titled
    The Humane Interface
    (wikipedia.org)
    with some provocative ideas that are probably
    very much worth re-visiting.
    For example, I like these two design rules:

    • Elimination of warning screens - modern software
      applications often ask the user "are you sure?" before some
      potentially harmful action; Raskin argues they are
      unhelpful because users tend to ignore them out of habit,
      and that having a universal undo
      eliminates the need for them.
    • Universal use of text - Raskin argues that graphic icons in
      software without any accompanying text are often cryptic to
      users.

    The Cat was the hardware and software incarnation of Raskin's
    design philosophies.

    Also, you have to check out the picture of Jef with a
    little model of the Cat on his Wikipedia page:
    Jef Raskin
    (wikipedia.org).
    Direct link to the image: here
    (wikipedia.org).

    The Cat ran on a
    Motorola 68000
    (wikipedia.org)
    CPU, which was also used in the Apple Macintosh and was one of the
    first 32-bit processors, featuring 32-bit instruction set, registers,
    and non-segmented memory addressing.

    Getting to the Forth interface doesn't seem to have been a top
    priority on the Cat.

    Quoting Dwight Elvey at the DigiBarn computer museum,
    Canon Cat: Enabling Forth
    (digibarn.com),
    the process sounds a bit awkward:

    "Highlight the string: Enable Forth Language.
    Then do: front, answer
    Then: shift, usefront, space
    You are now in Forth.
    You need to do: -1 wheel! savesetup re
    Front the editor, use the setup to set the keyboard to ascii
    so that you can type the characters and > with
    shift , and shift .
    Do a usefront disk.
    It will save to the disk so that it will be ready
    the next boot with just the: shift, usefront, space
    to restart Forth.
    To undo the Forth mode: Forth? off 0 wheel! re [sic everything]"

    (Note that "USE FRONT" is a dedicated key on the Canon Cat
    keyboard that lets you apply whatever function is printed on the
    front of another key on the keyboard. Clever, right? All of the
    Cat's interactions are performed through the keyboard like
    this.)

    And if that process weren't enough to put you off, this warning
    seems particularly dire and, if anything, hilariously understated:

    "Use care while in Forth mode as usefront shift : will
    format the disk (a good idea to make a backup or
    at least remove the disk while experimenting)."

    But all of that makes it sound worse than it is.
    Thanks to modern streaming video technology, you can
    see Dwight Elvey
    boot up a cat and demonstrate it
    (youtube.com).
    As you can see, getting to the Forth interface is really not a
    lengthy process at all once you know what to do. Just a couple keystrokes.
    And the Cat is a more compact computer than I imagined from the pictures.

    If you like industrial design or interesting computer interfaces,
    you owe it to yourself to check out the amazing pictures of
    Jef Raskin's team designing the Canon Cat (1985)!
    (digibarn.com)

    If you want to see a bunch of pictures of a vintage Cat in
    amazing shape, check out Santo Nucifora's
    Canon Cat
    (vintagecomputer.ca).

    If nothing else, just let this fact marinate in your head for a
    little bit: The Canon Cat had an OS, office suite, and
    programming environment in 256 KB of ROM.
    This
    document (not including the images) is almost exactly that
    size!

    Okay, now we are ready for...

    Easily one of the most exciting uses of Forth is space
    exploration because space is intrinsicly awesome.

    We've already seen how Chuck Moore was intimately
    involved in programming ground-based radio telescopes.
    But Forth has also found its way into tons (literally and idiomatically)
    of actual space craft in outer space!

    NASA is famous for having stringent rules about software
    that runs on spacecraft. Which makes sense, given the cost of these
    machines and the difficulty or even impossibility of getting
    to them to make fixes.

    NASA and the ESA

    The list of projects using Forth at NASA compiled by James Rash in 2003 is too long to easily list here.

    The image on the right is intentionally too small to read. As you
    can see, it's a big list.

    The original NASA link has died, but the page was archived by
    the Wayback Machine at archive.org. There's a nice copy
    hosted here as well:
    Forth in Space Applications
    (forth.com).

    I haven't found a list like this for the ESA, but the Philae
    lander featured below would be one very prominent example.

    (By the way, though Forth isn't featured here, there's a fun overview
    of some CPUs used in various space missions:
    The CPUs of Spacecraft: Computers in Space
    (cpushack.com).)

    (The image to the right is very tall. We need some more text for
    wider screens. So, while it's not about Forth,I won't miss this
    opportunity to mention one of my favorite computing-in-space books:
    Digital Apollo: Human and Machine in Spaceflight
    (mit.edu)
    by David Mindell. It will change how you look at the Apollo missions,
    computers in general, and the role of astronauts in space craft!)

    Space Shuttle Small Payload Accommodations Interface Module (SPAIM)

    "There is always great concern about software reliability, especially with flight software."

    From the paper
    Forth in Space: Interfacing SSBUV, a Scientific Instrument, to the Space Shuttle (PDF)
    (acm.org)
    by Robert T. Caffrey et al:

    "There is always a great concern about software reliability,
    especially with flight software. The effects of a software error in
    flight could be dramatic. We were able to produce reliable software
    by writing a Forth routine on the PC, downloading the software, and
    testing it interactively. We varied the inputs to a routine and
    checked the ability of the routine to operate correctly under all
    conditions. As a result, during the STS-45 Shuttle mission, the
    SPAIM flight software worked perfectly and without any problems."

    Forth systems can be multi-tasking and this allowed the system to
    monitor itself. Each task had its own stack and a watchdog task could,
    for example, check the health of another task by monitoring the
    other task's stack. (Stack depth was found to be a good indication of
    task health. In other words, malfunctions would often cause the stack
    depth to grow unchecked.)

    "The ability of the Forth development system to debug hardware and
    software interfaces, model missing hardware, simulate system
    malfunctions, and support system integration dramatically helped in
    the quick generation of error-free software. The interactive,
    integrated and multitasking features of the Forth system proved to
    be the key elements in the success of the SPAIM systems
    development. Several techniques such as stack depth monitoring,
    address monitoring, cycle time monitoring, and error flag
    monitoring provided system checks during both the system
    integration process and the actual Shuttle mission."

    The interactive nature of the Forth system is again found to be not
    just very convenient, but also a massive productivity boost for all
    phases of programming, debugging, and testing.

    The SPAIM system used a 16-bit Intel 87C196KC16 microcontroller,
    which is a MIL-SPEC member of the
    Intel MCS-96
    (wikipedia.org)
    family. These started out as controllers for Ford engines in the 1970s.
    They continued to be made in various incarnations until 2007 and were
    often used in common devices such as hard drives, modems, and printers.
    Unlike many chips headed to space long-term, this one wasn't "rad-hard"
    (hardened against the effects of radiation).

    NASA's Robot Arm Simulator

    Given the input of three-axis joystick commands, control a
    50-foot long, six-joint arm with six different coordinate systems.

    Entire system developed by one programmer in five weeks.

    The Space Shuttle Robot Arm Simulator
    (forth.com)
    was a complex machine with some challenging requirements.

    It turns out that you can't just use the same robot arm on
    the ground for simulations as the one that will go into space.
    For one thing, contending with gravity changes the requirements to
    such a degree that it's a completely different robot!

    "The GSFC arm, for example, is designed to carry up to a thousand
    pound payload at its tip. In order to do this it uses a high
    pressure (4000 psi) hydraulic system rather than electric motors as
    on the RMS.

    ...

    "Because of the completely different nature of the joint controls,
    the original RMS software was not usable except as a source of
    algorithms."

    So the simulator arm could not work the same way,
    but it had to pretend it did
    .

    You can see in my drawing that the arm lived in a full-scale
    simulated shuttle bay and was accompanied by an enormous model
    satellite. (That satellite looks like the Hubble Space
    Telescope to me, which seems plausible, given the dates on this
    project.)

    Just listen to these I/O requirements:

    "The RMSS contains fourteen separate processes: one for each joint,
    one for each joystick, one for the digital display panel, a
    simulation process, a trending process, and several supervisory
    processes."

    But, as seems to be a trend with Forth-based space software,
    the work was impeccable:

    "Simulation testing was so thorough that when the arm software was
    installed on site, not a single change was made to the executive
    control algorithms."

    Does Forth imply excellence, or does excellence imply Forth? Ha ha.

    Seriously, though, writing a system like that in five weeks
    is pretty astounding.

    Shuttle Mission Design and Operations System (SMDOS)

    JPL's ground-based control software for shuttle SIR-A and SIR-B
    radar imaging instruments.

    This section started off as an excuse to draw a Space Shuttle. But
    it's actually a great example of how a "live" interactive system
    can save a mission, even if the software itself hasn't been deployed into
    space.

    The paper:
    Forth as the Basis for an Integrated Operations Environment for a Space Shuttle Scientific Experiment (PDF)
    (forth.com)
    describes a number of hardware failures that had to be
    overcome.

    "It was in the first day of data taking that we noticed
    the first problem..."

    The SIR-B's transmitting antenna had shorted, resulting in the
    expected 1000 watts of power being reduced to a faint 100 watts.

    "Since the returned echo was negligible as received by the SIR-B
    antenna it was decided to increase the gain of the receiver.
    The problem was in not understanding what had happened to cause
    the failure. [It] was not immediately apparent what the
    appropriate gain should be..."

    Forth-based, highly adaptable SMDOS to the rescue!

    "No problem. With the advice of the radar engineers, the Forth
    module that was used to generate the display was quickly
    modified to produce a calibrated display. The gain of the
    receiver was increased until a perfect bell-shaped pattern
    again appeared on the display."

    Then a second hardware failure:

    "This was only the start of our problems. A satellite on board
    failed to deploy properly. The shuttle had to remain in high orbit
    until the problem was resolved before it could fire its engines to
    descend to the orbit that had been planned for the SIR-B data
    taking. "

    Now the shuttle would not be in the planned orbit for data-taking.
    A second SMDOS adaptation fixed that.

    Then a third hardware problem with another
    antenna:

    "A bolt had sheared in the antenna's pointing mechenism and the KU
    band antenna was trashing around, threatening to destroy itself. It
    was necessary for an astronaut to exit the shuttle (EVA) in a
    spacesuit to pin the antenna down."

    Now the shuttle had to rotate to point at a relay satellite to
    gather data (to tape!) and then rotate towards Earth to transmit
    the recorded data, and repeat.

    "Of course this meant an entirely new data-taking strategy. Again
    the SMDOS computers were put to work displaying new plans for the
    stringent new conditions."

    They lost tons of data, of course, but at least they were able to
    salvage 20% of it by rotating and capturing and rotating and
    transmitting. None of which would have a been possible if they had not
    been able to modify the software on the spot.

    Conclusion:

    "When the antenna feed failed and we realized that the software had
    to adapt to that failure, it was relatively easy given the
    interactive Forth enviroment to change the required module to meet
    the new specifications. This is clearly beyond the capabilites of
    most languages."

    Other systems are interactive, but Forth may be singularly unique in
    allowing complete freedom of modification in an interactive
    session.

    Of course, this kind of freedom is double-edged sword if there ever
    was one. The implied danger of that powerful sword (plus the postfix
    notation) has been a hard sell in the corporate world.

    So far, we've just seen Forth software in space. But it
    is often accompanied by Forth hardware.

    Yup, Forth hardware. Introducing:

    Forth hardware in space

    The Harris RTX2010 processor. Used in a ton of space
    applications.

    Featuring:

    • Direct execution of Forth
    • Two hardware stacks, 256 words deep
    • 8MHz clock, extremely low latency
    • Radiation hardened

    The
    RTX2010
    (wikipedia.org)
    and its predecessor, the RTX2000
    account for a good portion of the use of Forth in the space industry.
    They run Forth natively.

    The use of the RTX line in space may not be soley due to a particular
    love for Forth per se, but because of the specific attractive properties
    of these processors - very low latency and the ability to quickly
    process the floating point mathematical operations needed for neat space
    stuff like navigation and thruster control. Either way, the
    philosophy of Forth embedded in this hardware is suitable
    for the extreme environments in which they operate.

    Largely because of the stack-based design, the RTX 2000 and 2010
    have very compact machine code. Subroutines calls take only a
    single cycle and returns are free!
    All branches take
    exactly one cycle as well.

    They are also brilliantly minimalistic designs. The entire RTX2000
    instruction set fits on a single page. See the first PDF link below:

    • The Harris RTX 2000 Microcontroller (PDF)
      (vfxforth.com)
      - The RTX2000 as described in The Journal of Forth Application
      and Research by Tom Hand.
    • HS-RTX2010RH Data Sheet (PDF)
      (mouser.com)
      - The RTX2010 is now sold by Intersil.
    • RTX 2000 Data Sheet (PDF)
      (widen.net) as originally sold by Harris.
    • DigiKey evidently has 800+ RTX2000s in stock
      (digikey.com)
      through Rochester Electronics for a reasonable $22, but you
      have to by them in quantities of 14. (Maybe you can find
      14 friends to do a group buy?)

    So what kind of spacecraft use these Forth-native processors?

    Let's look at a specific pair of spacecraft:

    Rosetta and Philae

    First mission to send a spaceship to orbit a comet and then deliver a
    lander to the comet's surface!

    The Rosetta spacecraft's Ion and Electron Sensor instrument used a Harris RTX2010.

    The Philae lander used two Harris RTX2010s for complete system control (CDMS) and two more to control its landing system (ADS).

    The ESA's Rosetta mission
    (esa.int)
    was hugely ambitious: Send a spacecraft to
    rendezvous with and then follow a comet around the Sun, deploy
    the Philae lander to the surface by dropping it into the comet's
    gravity well, observe the lander as it shoots harpoons
    into the icy surface of the comet to keep from bouncing back out
    into space, then relay the lander's communication from the surface back to
    distant Earth, 28 minutes away at the speed of light.

    Rosetta traveled in the Solar System for a full decade (2004 to
    2014) before meeting up with comet 67P/"Churyumov-Gerasimenko".
    (67P is 4km wide and orbits the sun every six and a half years.)

    Rosetta orbited the comet for three months and then deployed
    the Philae lander to the surface of the comet.

    Both craft contained a full laboratory of advanced scientific
    instruments (11 on Rosetta, 9 on Philae) including some that doubled
    as high-resolution cameras with images suitable for humans to view.
    The whole mission
    (wikipedia.org)
    is worth reading about. There are some fantastic images
    and animations to be seen on the mission page and on the
    comet's own page
    (wikipedia.org).

    Often described as being "the size of a washing machine," the
    Philae
    (wikipedia.org)
    lander pushed away from Rosetta's orbit to drop to the surface of 67p.

    The picture at the right was taken
    from Rosetta's OSIRIS imager as Philae fell slowly away from the
    orbiter.

    Because the comet's gravitational pull is so small (huge boulders
    have been observed moving around on its surface), a pair of harpoons
    were meant to fire into the surface of the comet and hold the lander
    down. These did not deploy (possibly a mechanical failure) and a
    landing thruster also failed, so Philae ended up having a long,
    low-gravity tumble on the surface.

    It's been speculated that the harpoon failure actually
    saved Philae from an even more exciting trip because studies
    of the surface found it to be harder than expected. It might have
    launched itself away rather than anchoring! As it was, the lander
    bounced with a force that was just shy of escaping the comet's
    gravitational pull entirely. It rose a full kilometer above the surface
    before slowly returning for another two bounces to its final resting
    spot.

    A pair of Harris RTX2010s controlled Philae's Active Descent System.
    Check out Here comes Philae! Powered by an RTX2010
    (cpushack.com):

    "Why was the RTX2010 chosen? Simply put the RTX2010 is the lowest
    power budget processor available that is radiation hardened, and
    powerful enough to handle the complex landing procedure. Philae
    runs on batteries for the first phase of its mission (later it will
    switch to solar/back up batteries) so the power budget is critical.
    The RTX2010 is a Forth based stack processor which allows for very
    efficient coding, again useful for a low power budget."

    Here is more information (with pictures!) about the physical design
    and components in the Philae control system:
    Command and Data Management Subsystem (CDMS) of the Rosetta Lander (Philae)
    (sgf.hu).

    "Harris RTX2010 processor has been selected for the DPU boards
    because it is the lowest power consuming, space qualified,
    radiation hardened, 16-bit processor with features to provide so
    complicated functions as the CDMS has to perform. It is a stack
    based, Forth language oriented processor with an exotic and
    challenging instruction set. CDMS is a real-time control and data
    acquisition system, and it has to process tasks in parallel.
    Therefore, a real-time, pre-emptive multitasking operating system
    has been developed to run application tasks executing the required
    functions in parallel."

    And here is the lander's
    Active Descent System (ADS) QM User Manual
    (spyr.ch)
    which has way more detail about this computer system, including a
    number of details about the Forth software:

    "After resetting the subsystem (power-on reset), the bootstrap
    sets up the Forth environment, copies the firmware from PROM to
    RAM and disables the PROM for further access.

    After this, the main word Do-App is called from the Forth system
    immediately after setup. You can find the main word Do-App in the
    file app.fth (see part II). Do-App calls Init-App, which itself
    calls other initilisation words like Init-ADS. Then the
    application enters the main loop. In the main loop the following
    actions are performed:

    • reset the watchdog (watchdog is enabled for the QM)
    • put the data into the HK registers
    • get the data from the ADC handler
    • process CDMS requests"

    Despite the unfortunate landing, which put Philae in too much
    shadow to get as much solar energy as hoped and at an angle that
    made communication with Rosetta difficult, Philae was still
    robust enough to perform "80%" of its scientific mission, which
    is pretty amazing.

    A picture taken by the Philae lander as it lay on its side, enjoying some sunlight on
    one of its feet:

    This is one of the final images taken by the Rosetta orbiter as it made the
    "hard descent" (controlled crash landing) to the surface of comet 67p:

    The image and a description are here: Final Descent Images from Rosetta Spacecraft
    (nasa.gov).

    "The decision to end the mission on the surface is a result of
    Rosetta and the comet heading out beyond the orbit of Jupiter
    again. Farther from the sun than Rosetta had ever journeyed before,
    there would be little power to operate the craft. Mission operators
    were also faced with an imminent month-long period when the sun is
    close to the line-of-sight between Earth and Rosetta, meaning
    communications with the craft would have become increasingly more
    difficult."

    By the way, the ESA has a nice summary of the computer hardware
    used by the OSIRIS camera on Rosetta which was used to take the surface
    image above and also the little picture of the descending lander further
    above.
    Optical, Spectroscopic, and Infrared Remote Imaging System
    (esa.int).

    After finishing the first draft of this article, I was so excited about
    the Rosetta mission that I ended up ordering and reading
    Rosetta: The Remarkable Story of Europe's Comet Explorer by Peter Bond.
    It's a bit of a dry read, but the subject matter is thrilling
    nonetheless and the coverage is thorough. I recommend it if you want to
    know a lot more about this awesome engineering and scientific milestone.
    (It does not, sadly, mention Forth.)


    Rabbit Hole Alert: This takes us away from
    Forth for a moment, but learning about the
    Virtuoso RTOS (real-time operating system) eventually leads to a deep,
    deep Wikipedia rabbit hole that takes you on a journey to the Inmos
    processors, Hoare's CSP, the occam programming language, the HeliOS
    parallel computing operating system, and the concept of the
    "transputer" microprocessors.

    Transputers use whole processors as
    building blocks for a parallel computer in the same way transistors are
    used as the building blocks for processors. (Thus, transputer =
    "transistor computer," you see?) They were mostly featured in
    supercomputers, but they also saw some industrial controller use
    and there was even an Atari Transputer Workstation,
    ATW-800.

    (I've intentionally not linked to any of these things here
    because you'll disappear into that hole and never see the end of this
    document, which would be very sad. Also, I mention "transputers" again
    one more time below and you wouldn't want to miss that.)


    The Rosetta orbiter and Philae lander now rest silently on the
    surface of 67p, where they will no doubt stay for billions of
    years or until future comet tourists pick them up and put them
    in a trophy room, whichever comes first.

    Stop Writing Dead Programs

    "...Space probes written in Lisp and Forth have been
    debugged while off world... If they had proven their programs
    correct by construction, shipped them into space, and then found out
    their spec was wrong, they would have just had some dead junk on
    Mars
    . But what these guys had was the ability to fix things
    while they are running on space probes... In addition, the spec is
    always wrong!"

    -- Jack Rusher, Stop Writing Dead Programs (talk given at Strange Loop 2022)

    Here's the talk:
    "Stop Writing Dead Programs" by Jack Rusher (Strange Loop 2022)
    (youtube.com).

    You've got 43 minutes to watch it. I'm timing you. Don't get
    distracted by other YouTube suggestions. Come back here. I'm waiting.

    Or better yet, check out Jack's awesome transcript,
    which was super helpful when I wanted to re-find the above quote:
    Stop Writing Dead Programs.
    (jackrusher.com).

    In his transcript, he notes:

    "Had I had more time, I would have done an entire series of slides
    on FORTH. It's a tiny language that combines interactive
    development, expressive metaprogramming, and tremendous machine
    sympathy. I've shipped embedded systems, bootloaders, and other
    close-to-the-metal software in FORTH."

    I was extremely interested in hearing about Forth systems being
    updated in space, but had a heck of a time finding any.
    I finally found one on a page that is otherwise largely
    dedicated to Lisp's use at the Jet Propulsion Labs:
    1992-1993 - Miscellaneous stories
    (sourceforge.io)
    on the amazing, sprawling site for the Mecrisp-Stellaris Forth
    (which runs on various non-x86 CPUs):

    "Also in 1993 I used MCL to help generate a code patch for the
    Gallileo magnetometer. The magnetometer had an RCA1802 processor, 2k
    each of RAM and ROM, and was programmed in Forth using a development
    system that ran on a long-since-decommissioned Apple II. The
    instrument had developed a bad memory byte right in the middle of
    the code. The code needed to be patched to not use this bad byte.
    The magnetometer team had originally estimated that resurrecting the
    development environment and generating the code patch would take so
    long that they were not even going to attempt it. Using Lisp I wrote
    from scratch a Forth development environment for the instrument
    (including a simulator for the hardware) and used it to generate the
    patch. The whole project took just under 3 months of part-time
    work."

    (If anyone has any leads to other notable Forth uses in space, I'd love to
    hear about them.)

    When we defeat the alien kill-bots and reprogram them, it will
    surely be with a Forth of some sort.

    In the background, one of the Invader machines lies crumpled and
    smoking amidst ruins. This was one of Earth's great cities.

    Stomping towards us with its mechanical arms raised in victory, is
    another Invader. But this one is different. The tell-tale giveaway is
    the opening in its protective head dome. And is that a flag? Why yes, it is!

    At great cost, humans managed to trap one of the Invaders long
    enough penetrate its outer defenses, while otherwise leaving the
    machine unharmed and operable.

    Working feverishly against a doomsday clock, they burrowed deep into
    the electrical heart of the machine, identifying and classifying its
    alien functions until they understood it well enough to attempt
    an interface.

    A bus protocol was decoded. Programming work began.

    It went poorly. The aliens had unthinkably bizarre notions of
    generalized computing that defied all known patterns of software.

    Everything had to be done with agonizing labor, stringing
    sequences of raw bus messages together in hopes of getting a
    correct response.

    But then someone had the bright idea to bootstrap a Forth
    from the known instruction sequences. With this, they could write
    a bare-bones interpreter. And, at last, they could experiment
    quickly and safely.

    Days later, an arm moved. Then they crushed a barrel with a
    gripper claw:

    BARREL OBJECT-ID VISION TARGET
    133 L-ARM-FWD 14 L-CLAW-OPEN
    25 L-ARM FWD 14 L-CLAW CLOSE
            

    Then a first four-legged step. Then 20 steps:

    PREP-QUAD-LEGS
    20 STRIDE-LOOP
            

    As ravaged fighters looked on in amazement, "Defender-1" burst
    from the old brick warehouse and, in a terrific crash, it toppled
    another Invader as it was passing by on patrol.

    The machines grappled for a moment and it
    looked as if Defender-1's clumsy movements would be no match
    for the alien, even from a superior position.

    But humans had decoded all of the weapon systems by then and a
    special word had been prepared for this moment:

    : KILL
        100 BEAM-LEVEL
        BOT OBJECT-ID VISION TARGET
        L-BEAM FIRE-FULL
        R-BEAM FIRE-FULL
    ;
            

    Twin blinding beams of energy struck the enemy full in the torso
    and instantly turned its mechanical guts into sizzling plasma.
    After a moment of silence, a single cheer rose up from a doorway
    nearby and was soon joined by a hundred different voices from
    places of concealment in the ruined buildings.

    Now the humans had the upper hand at last! Other Invader
    machines were disabled or captured. Defender-1 was joined
    by Defender-2, and then 3, 4, 5, and more!

    Software was passed by sneaker-net and by shortwave packet radio.
    City by city, Earth took back control. And along with victory,
    word of the One True Language spread across the land. Flags
    were raised in honor of its original discoverer, Chuck Moore.

    Where other abstractions had failed, the universal machine
    truth of Forth had succeeded.

    Forth is an idea

    Here's a "family tree" of some notable Forths:

    Obviously the graphic is unreadably tiny. For the full-size
    original and the gForth program used to create it, check out:

    Forth Family Tree and Timeline
    (complang.tuwien.ac.at).

    One of the hardest things about trying to learn "Forth" is realizing
    that there is no single implementation that can lay sole claim to that name.
    As we've seen, some of Chuck's first Forths pre-date the name entirely.

    There are Forth standards dating back to the original ANS Forth
    document and continuing with the
    Forth 2012 Standard and Forth200x committee
    (forth-standard.org).

    Forths have shared concepts. There are many common words, certainly, but purpose-built
    Forths will have their own special vocabularies.

    Also, it is true that making Forths is at least as fun
    as using them.

    The forest of computing is peppered with hobby Forths. They grow where nothing
    else can survive. They flourish in the sun and in the shade.
    Each one is a little glittering jewel.

    What about Chuck?

    Charles H. Moore founded Forth, Inc in 1973. He's continued to port
    Forth to various systems ever since. But he's never stopped inventing.

    I drew this image of Chuck from a photo in this amazing quote
    collection,
    Moore Forth: Chuck Moore's Comments on Forth
    (ultratechnology.com)
    compiled by Jeff Fox.

    You'll notice I added some color to my drawing for this one, and
    that's because I'm pretty sure that what we're seeing on Chuck's monitor
    is...

    colorForth

    The above screenshot is actually from
    a page about etherForth,
    (etherforth.org),
    which is a
    colorForth
    written for GA144 chips. (Don't look up those chips yet unless you
    want a spoiler for what's coming in a moment below!)

    What the heck are we looking at here?

    So,
    colorForth
    (wikipedia.org)
    is:

    "An idiosyncratic programming environment, the colors simplify
    Forth's semantics, speed compiling, and are said to aid Moore's own
    poor eyesight: colorForth uses different colors in its source code
    (replacing some of the punctuation in standard Forth) to determine
    how different words are treated."

    And, of course:

    "The language comes with its own tiny (63K) operating system.
    Practically everything is stored as source code and compiled when
    needed. The current colorForth environment is limited to running on
    Pentium grade PCs with limited support for
    lowest-common-denominator motherboards, AGP video, disk, and
    network hardware."

    But the best description of
    colorForth
    and its strengths come from Chuck Moore himself in an interview in
    2009,
    Chuck Moore: Geek of the Week
    (red-gate.com):

    "Forth has some ugly punctuation that colorForth replaces by
    coloring source code. Each word has a tag that indicates function;
    it also determines color. This seems a small point, but it
    encourages the use of functions, such as comments or compile-time
    execution, that would be inconvenient in Forth."

    It should be noted that the colors can be replaced with symbols or
    notation, so using the language without the ability to
    distinguish color is not a barrier. Color is just one way to
    show this information.

    There are a ton of other enhancements beyond the obvious color aspect,
    such as:

    "By having words preparsed, the compiler is twice as fast. Another
    small point, since compiling is virtually instantaneous, but this
    encourages recompiling and overlaying the modules of an
    application. Smaller modules are easier to code, test and document
    than a large one."

    That interview contains another Chuck Moore quote about software
    construction in general:

    "Instead of being rewritten, software has features added. And
    becomes more complex. So complex that no one dares change it, or
    improve it, for fear of unintended consequences. But adding to it
    seems relatively safe. We need dedicated programmers who commit
    their careers to single applications. Rewriting them over and over
    until they're perfect."

    This is something I've seen repeated again and again by some of
    the most respected minds in software: You cannot just keep adding
    things to a program. You must continually re-work the program to match
    your needs as they change over time. Ideally, you re-write the program.
    Only time and deep consideration can yield the most elegant, correct,
    and simple program.

    Which brings us to...

    The pursuit of simplicity

    Chuck Moore has been fighting against software complexity since the 1950s.

    "I am utterly frustrated with the software I have to deal with. Windows is beyond comprehension! UNIX is no better. DOS is no better. There is no reason for an OS. It is a non-thing. Maybe it was needed at one time.

    -- Chuck Moore, 1997

    "If they are starting from the OS they have made the first mistake. The OS isn't going to fit on a floppy disk and boot in ten seconds."

    -- Chuck Moore, 1999

    These quotes also come from Jeff Fox's quotes collection,
    Moore Forth: Chuck Moore's Comments on Forth
    (ultratechnology.com).

    As you've no doubt gathered over the course of this page,
    Chuck is no fan of big, heavy, complicated software such as
    operating systems.

    He believes in compact, machine-sympathetic programming.

    "Mechanical Sympathy" is not Chuck's term, but I believe it
    accurately describes his philosophy. It comes from this
    (apocryphal?) quote by
    Formula One race car driver
    Jackie Stewart
    (wikipedia.org):

    "You don't have to be an engineer to be a racing driver, but you
    do have to have mechanical sympathy."

    The use of the term to describe software comes from Martin Thompson's
    blog of the same name.
    In Why Mechanical Sympathy?
    (blogspot.com),
    he writes:

    "Why does the software we use today not feel any faster than the
    DOS based applications we used 20 years ago??? It does not have to
    be this way. As a software developer I want to try and produce
    software which does justice to the wonderful achievements of our
    hardware friends."

    Again and again, you'll see this sentiment echoed by Chuck Moore
    and fans of Forth.

    I think it's very interesting and telling that Forth tends to be
    popular with "hardware people" such as electrical engineers and embedded
    systems designers. By contrast, it seems that "software people"
    tend to idolize a more abstract, high-level beauty as found
    in languages such as Lisp or Scheme.
    Of course, this is a gross generalization and may have no basis in fact,
    but I know I'm not the only person to notice this trend.

    Maybe another way to describe this aspect of Forth is that it has a
    "mechanical purity" in the same way that Joy, with its combinators,
    has a "mathematical purity."

    And speaking of hardware...

    Processor Design

    Chuck's real love seems to be processor design.
    Those Harris RTX2000 and RTX2010 chips used in so many space missions?
    That's basically his chip!

    No kidding.

    Chuck, that brilliant rascal, has been designing hardware since 1983
    starting with the Novix N400 gate array. An improved design was
    sold to Harris to become the RTX chips.

    Chuck designs processors with his own VLSI software, "OKAD", written in
    500 lines of Forth, of course.

    Take a moment to pause on that last sentence.

    Processor design software written in 500 lines?

    You read that right.

    OKAD is one of the Great Legends of Chuck Moore.
    But what, exactly, is it?

    First off, VLSI stands for
    Very Large Scale Integration
    (wikipedia.org):

    "Very large-scale integration (VLSI) is the process of
    creating an integrated circuit (IC) by combining millions or
    billions of MOS transistors onto a single chip. VLSI began in the
    1970s when MOS integrated circuit (Metal Oxide Semiconductor) chips
    were developed and then widely adopted, enabling complex
    semiconductor and telecommunication technologies. The
    microprocessor and memory chips are VLSI devices."

    The product of VLSI is what we think of when we imagine
    the modern image of "computer chip" in our minds.

    "Integration" is simply the shrinking of computers from whole rooms to
    microscopic thinking dust:

    • Computers began with processors the size of rooms with
      discrete logic gates you can touch (relays to vacuum tubes to
      transistors).
    • Then, processors were shrunk down to the size of refrigerators
      with logic boards of integrated circuits (ICs).
    • Finally, entire processors shrunk down to fit on a single chip via
      Very Large Scale Integration.

    (Also, in a parallel path from mainstream desktop computing,
    VLSI has also produced entire computers and, increasingly,
    multiple computers on a single chip, also
    known as
    "system(s) on a chip" (SoC)
    (wikipedia.org).
    The lines around the various types are extremely blurry, but
    some familiar forms are microcontrollers, embedded systems,
    various "mobile" devices, etc.)

    Anyway Moore's,
    VLSI Design Tools (OKAD)
    (colorforth.github.io)
    system a complete processor workshop:

    "In 500 lines of
    colorForth,
    these tools provide everything required to design a chip."

    OKAD is really more of a collection of tools that work together to:

    • Describe the basic logic gates (constructed of transistors),
    • Design the layout of the entire circuit (the three-dimensional multi-layered network of connections between gates),
    • Simulate the circuit electrically (voltage, temperature, capacitance, etc.),
    • And export the finished design to the industry-standard
      GDSII
      (wikipedia.org)
      file format that is given to IC foundries (or "chip fabs").

    For more about OKAD, I highly recommend reading the
    excellent answers to
    Did Forth's inventor Charles Moore really write a CAD program in only 5 lines of code?
    (retrocomputing.stackexchange.com).

    Moving on from the software to Moore's chips themselves, Moore himself wrote
    a nice little summary of his designs. It is written in Moore's typical consise style,
    giving just a few key details about each chip:
    Forth Chips
    (colorforth.github.io).

    First, there was the Novix NC4000, which was designed
    for a CMOS gate array.

    Here's a whole book about the NC4000 chip: Footsteps in an Empty Valley: NC4000 Single Chip Forth Engine (8Mb PDF) by Dr. Chen-Hanson Ting.

    To quote Dr. Ting from Chapter 2:

    "The Novix NC4000 is a super high-speed processing engine which is
    designed to directly execute high level Forth instructions. The
    single chip microprocessor, NC4000, gains its remarkable
    performance by eliminating both the ordinary assembly language and
    internal microcode which, in most conventional processors,
    intervene between the high level application and the hardware. The
    dual stack architecture greatly reduces the overhead of subroutine
    implementation and makes NC4000 especially suited to support high
    level languages other than Forth."

    As you can see, this reads just like a description of the Harris RTX
    chips used in the spacecraft we explored above.

    Sure enough, if we read the History section on the
    RTX2010 page,
    (wikipedia.org)
    the lineage is made very clear:

    "In 1983, Chuck Moore implemented a processor for his programming
    language Forth as a gate array. As Forth can be considered a dual
    stack virtual machine, he made the processor, Novix N4000 (later
    renamed NC4016), as a dual-stack machine. In 1988, an improved
    processor was sold to Harris Semiconductor, who marketed it for
    space applications as the RTX2000."

    Another great article about Moore's early processor design work
    (and some more spacecraft mentions!), check out
    Charles Moore: From FORTH to Stack Processors and Beyond
    (cpushack.com)
    which is part one of a two-part series.

    After the Novix, came a variety of chip projects:

    • Sh-Boom (32-bit, 20 Mips),
    • MuP20/MuP21 (21-bit, 100 Mips),
    • F21 (500 Mips - and be sure to check out
      F21 in a Mouse
      (ultratechnology.com), which is a complete F21 computer running a
      graphical environment that has been packed
      into a PC mouse...in the Pentium era!)
    • i21 (21-bit, 55 Mips)
    • X18 (18-bit, 2400 Mips)

    These are all real systems that really worked. The hard part has always
    been finding customers.

    Over the years, other people have also created Forth chips and FPGA
    implementations of hardware Forth-likes. Check out the links on
    Forth CPU Cores
    (forth.org)
    and
    Forth Chips
    (ultratechnology.com).

    In addition to
    colorForth,
    Moore also developed "Machine Forth" as an even more
    machine-sympathetic language than traditional Forth. It's based on
    the machine code of the MuP21 microprocessor listed above.

    I won't go into a lot of detail about Machine Forth, but
    here are some interesting links:

    • MuP21 Machine Forth Tutorial #1
      (ultratechnology.com)
    • P21Forth 1.02 User's Manual
      (ultratechnology.com)
    • Machine Forth for the ARM processor (PDF)
      (tuwien.ac.at)
    • MachineForth - Inspired by Chuck Moore's "Machine Forth" and the MuP21 processor.
      (github.com)
    • Machine Forth (links and bibliography)
      (jjn.one)

    As you can see, Moore has always been looking for new ways to work
    with computers, a partnership between the machine and the programmer.

    Which brings us to the current state of Chuck Moore's art...

    GreenArrays

    "Programming a 144-computer chip to minimize power" (2013)

    144 asynchronous computers on a chip. Idle cores use 100 nW. Active ones use 4 mW, run at 666 Mips, then return to idle. All computers running flat out: 550mW (half a Watt).

    Check out Chuck's talk at StrangeLoop:
    Programming a 144-computer chip to minimize power - Chuck Moore (2013)
    (youtube.com)

    And here's the official website:
    GreenArrays, Inc.
    (greenarraychips.com)
    "Ultra-low-powered multi-computer chips with integrated
    peripherals."

    Probably the best summary comes from the architecture document,
    GreenArrays Architecture (PDF)
    (greenarraychips.com):

    "COMPLETE SYSTEMS: We refer to our chips as Multi-Computer Systems because they are, in fact, complete systems. Supply one of our chips with power and a reset signal, and it is up and running. All of our chips can load their software at high speed using a single wire that can be daisy chained for multiple chips; if desired, most can be bootstrapped by a simple SPI flash memory.

    "Contrast this with a Multi-Core CPU, which is not a computing system until other devices such as crystals, memory controllers, memories, and bus controllers have been added. All of these things consume energy, occupy space, cost money, add complexity, and create bottlenecks.

    "NO CLOCKS: Most computing devices have one or more clocks that synchronize all operations. When a conventional computer is powered up and waiting to respond quickly to stimuli, clock generation and distribution are consuming energy at a huge rate by our standards, yet accomplishing nothing."

    It goes on to explain the fine-grained power usage, how each computer
    communicates with its neighbor, and similar statements high-level
    descriptions.

    You can buy these chips right now for as little as $20 in quantities
    of 10. The only problem is that to easily make to use of one, you either
    need to buy the $495 development board or make your own. I've found
    precious few examples of people who have done this online.

    One rare example is
    Hands on with a 144 core processor
    (archive.org of designspark.com).
    Article author Andrew Back even has screenshots of the of the
    arrayForth environment (which is basically
    colorForth)

    The question, of course, is what do you do with this thing?

    It may turn out that the answer can be found by looking back into
    computing history. You don't even have to go back very far.

    If you read the "Rabbit Hole Alert" under the picture of the surface
    of comet 67p above, then you saw the term "transputer".
    I think it would be very interesting to compare and contrast the
    GreenArrays GA144 chips to the Inmos transputer chips.
    It seems to me, at first glance, that anything those transputers would
    have been suited for ought to be a good fit for a GreenArrays multi-computer chip
    as well.


    Rabbit Hole Alert 2: Another fun diversion into massively parallel
    computers is one of my favorites: Danny Hillis's
    Connection Machine
    (wikipedia.org)
    computers featuring a "12-dimensional hypercube" routing design.

    Hillis himself is a "human rabbit hole" of inventions, ideas, and
    writings. He's the author of one of my favorite non-fiction books, "The
    Pattern on the Stone," and co-founder of The Long Now Foundation
    (along with some other "human rabbit holes" including the incredible
    writer and thinker, Steward Brand).

    One of the projects of the Long Now
    Foundation is the design and creation of the 10,000 year giant
    mechanical Clock of the Long Now which is intended to tick once
    per year and have a cuckoo that comes out once every 1,000 years.

    There is also a direct connection between the Long Now and the
    Rosetta spacecraft: Long Now created the "Rosetta disc", an extremely
    clever physical object containing the micro-etched text of over
    a thousand human languages. The Rosetta spacecraft carried a nickel
    prototype of the disc. So that's now sitting on a comet.

    As with the previous rabbit hole alert, I could link to all of these
    people and things, but each is part of an unfathomably deep fractal of
    fascinating stuff and I'm afraid you might never come back to finish
    this. But do look them up later!


    At any rate,

    The only problem with parallel computers is that we're still
    not that great at programming them.

    Heck, we're not even that great at serial programming yet.

    The future: sustainable low-energy computing and Forth?

    "If you talk about molecular computers that are circulating in your bloodstream, they aren't going to have very much power and they aren't going to have very much memory and they aren't going to be able to use much energy.

    -- Chuck Moore, Programming a 144-computer chip to minimize power, 2013

    The eventual complete domination of x86 PCs in practically all areas
    of computing, followed by the current rise of powerful ARM CPUs are
    historical computing fact. Incredible feats of processor engineering
    have made it possible to run what can only be described as
    "supercomputers" on battery power and put them in our pockets.

    Trends in both software and hardware have been towards
    ever-increasing layers of complexity. The layers are very deep and
    very wide.

    As I write this, certain popular avenues of computing threaten to
    make every current piece of inefficient software seem absolutely
    frugal by comparison.

    (Incredibly, we're not even content with the
    supercomputers on our desks and in our hands. So we rely on
    services which work remotely over the Internet on powerful networks
    of computers in huge data centers. We think of this computing as
    cheap or even free because much of it is indirectly paid for with
    advertising dollars. Paid for, that is, with our attention and
    personal data. Those data centers with their screaming cooling fans
    and backup generators are somewhere else, not in our living rooms.
    It's easy to simply forget how all of this is made possible.)

    Increasingly, we rely on massively complex software with that seems
    to have an unending appetite for computing power.

    But do these trends have to continue?

    There is absolutely no reason we have to use increasingly
    inefficient and poorly-constructed software with steeper and steeper
    hardware requirements in the decades to come.

    In fact, the reverse could be true.

    There are plenty of applications where low energy computing is a
    categorical requirement and I believe these applications will only
    increase.

    Forth-likes could have a strong future as we look towards:

    • Tiny, ubiquitous computers
    • Solar power
    • Heavily constrained VMs

    There are physical realities (such as the speed of light) which
    ultimately govern the speed at which we can perform a calculation or
    the maximum number of calculations which can be done with a Watt of
    electricity using computers made out of atoms. These are hard limits.
    But there will surely be other plateaus along the way to reaching these
    limits.

    Around the year 2006, we saw Dennard scaling
    slow to a crawl.
    Dennard scaling
    (wikipedia.org)
    describes the relationship between
    the shrinking size of transistors to the increase of computing speed.
    Simply put, smaller transistors can switch at higher speeds and take
    less voltage. This scaling law held for many years. But we reached a
    speed plateau at around 4 GHz because of current leakage and heat.

    In
    The Free Lunch Is Over
    (gotw.ca),
    published in Dr. Dobb's Journal in 2005, Herb Sutter writes,

    "The major processor manufacturers and architectures, from Intel
    and AMD to Sparc and PowerPC, have run out of room with most of
    their traditional approaches to boosting CPU performance. Instead
    of driving clock speeds and straight-line instruction throughput
    ever higher, they are instead turning en masse to hyperthreading
    and multicore architectures."

    Multicore processors and increasingly clever hardware architecture
    tricks have continued to provide increases in computing power...but it's
    not the same.

    Near the end of the article, Sutter advises:

    "There are two ways to deal with this sea change toward
    concurrency. One is to redesign your applications for concurrency,
    as above. The other is to be frugal, by writing code that is more
    efficient and less wasteful.
    This leads to the third interesting
    consequence:

    "3. Efficiency and performance optimization will get more, not less, important.
    Those languages that already lend themselves to heavy optimization will find new life; those that don't will need to find ways to compete and become more efficient and optimizable. Expect long-term increased demand for performance-oriented languages and systems."

    (Emphasis mine.)

    For now, we're still eating the remains of that free lunch.

    I'm probably fairly rare among programmers in wishing it would end.
    I'd like to see greater emphasis on the craft and art of software.
    I'd like to see us make full and intentional use of the incredible
    power available to us now.

    The
    retrocomputing
    (wikipedia.org)
    hobby has continually shown how much more we could have done with the
    home computers of the 1980s if we had continued to use them.
    In many cases, they've been shown to be able to run programs
    previously thought impossible.
    The things we could do with current hardware are surely
    even more amazing, but it will be perhaps decades before we find
    out.

    In 1958, Chuck Moore created a dirt-simple interpreter on an
    IBM 704. That computer filled a room and cost about 2 million dollars.

    I can buy a more powerful computer (minus the awesome control panel
    with lights and switches) today for literal pocket change
    in the form of a "microcontroller", a complete computer on a single
    silicon chip, and write a powerful Forth system for it. That computer
    can run on a coin cell battery or even a tiny solar panel, sipping power
    where the IBM 704 inhaled it.

    There has never been a more incredible time for small-scale computing.
    Like the explosion of personal computers in the 1980s, the time is ripe
    for fun, creative, interesting, useful, and very personal
    computers and software.

    These tools can do useful work and they can also teach and delight us.
    Ideas like Forth are ripe for rediscovery as we learn exciting new
    ways to compute with arrays of inexpensive, low-power computers.

    We can pursue this line of thinking for pragmatic reasons, or just
    because it is beautiful and fun and worth doing for its own sake.

    Chuck Moore is basically retired now, programming and toying with
    software with no deadlines or clients.

    It is now on us to take up the mantle of Forth, to champion the values
    of ingenuity, elegance, efficiency, and simplicity.

    Forth is...

    Simple

    To really understand the value of Forth (and especially Chuck Moore's
    later work on Machine Forth and the GreenArrays computers), we must consider the
    difference between "simple" and "easy".

    We were blessed with the ability to speak of this difference by
    Rich Hickey in his brilliant talk,
    "Simple Made Easy" (2011)
    (youtube.com)
    which every developer should see at some time in their life.
    (Or read the transcript of Simple Made Easy
    (github.com)
    provided by Mattias Nehlsen.)

    Forth is not easy. It may not always even be pleasant. But it is certainly simple.
    Forth is one of the simplest programming languages there has ever been.

    A crafted language

    If the best software is truly crafted for problem at hand, then
    it makes sense that an idea programming language would also be
    crafted for the problem at hand.

    An absolutely amazing talk about language design,
    Guy Steele's
    Growing a Language (1998)
    (youtube.com)
    demonstrates how languages are built up from primitives.
    The talk is a performance art and deeply insightful.

    Steele helpfully also wrote up a transcript of the talk:
    Growing a Language (PDF)
    (virginia.edu)
    Imagine Steele is saying "Forth" here in place of "Lisp" because
    the point is the same:

    "Lisp was designed by one man, a smart man, and it works in a way
    that I think he did not plan for. In Lisp, new words defined by the
    user look like primitives and, what is more, all primitives look
    like words defined by the user! In other words, if a user has good
    taste in defining new words, what comes out is a larger language
    that has no seams."

    Go forth and create the perfect
    programming language for you!

    The Legend Confirmed

    I promised I would show you a magic trick at the end of this article.

    Behold, a new definition for the integer 4:

    : 4 12 ;
        

    Which I shall now use in a sentence:

    ." The value of 4 is " 4 . CR
    
    The value of 4 is 12
        

    Tada!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleLoC Is a Dumb Metric for Functions
    Next Article Look at how unhinged GPU box art was in the 2000s
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts

    February 13, 2026

    Intel’s tough decision boosted AMD to record highs

    February 13, 2026

    Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

    February 13, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025669 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025257 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025153 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Technology February 13, 2026

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts Image:…

    Intel’s tough decision boosted AMD to record highs

    Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

    Microsoft Store goes zero-clutter—through the command line

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    This MacBook Pro has a Touch Bar and is only $410 while stock lasts

    February 13, 20263 Views

    Intel’s tough decision boosted AMD to record highs

    February 13, 20262 Views

    Bundle deal! Ring Battery Doorbell and Outdoor Cam Plus (44% off)

    February 13, 20263 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.