RISC-V Pushes 400 Million Forth Words Per Second

We’ll be honest. Measuring Forth words per second doesn’t seem like a great benchmark since a Forth word could be very simple or quite complex. But we think the real meaning is “up to 400 million words per second.” There was a time when that level of performance would take a huge computer. These days, a simple board that costs a few bucks can do the trick, according to [Peter Forth] in an online presentation.

The key is the use of the Milk V Duo and some similar boards. Some of these look similar to a Raspberry Pi Pico. However, this chip on board has two RISC V cores, an ARM core, and an 8051. There’s also an accelerator coprocessor for vector operations like AI or video applications.

[Peter] has some popular Forth systems ported to the machine on GitHub. This might be the easiest way to get started because, as he mentions in the video, the documentation for these boards leaves something to be desired. However, these chips have a lot of capability for a small price.

We like Forth. If you want something that is less of a port, we’ve seen some native RISC V implementations.

28 thoughts on “RISC-V Pushes 400 Million Forth Words Per Second

  1. A lot of hard science (chemistry/biochemistryphysics/statistics) computer programs began life in the Forth universe. I suspect that hackers now have an easy path to making DIY scientific instrumentation more accessible.

  2. “we’ve seen some native RISC V implementations”

    Wait. So this is a Forth interpreter written in Python, running on a 1 GHz Risc-V? Suddenly, the 400 million Forth words per second claim sounds a little .. farfetched.

    1. The Forth that runs 400 MegaForthWords per second is written in native assembly, and is very well known in the Forth community : Mecrisp . The other Forth is called PygmyForth and runs in Python (for portability and easy access to all libraries and OS of the system). There is a 3rd Forth running on the demo, which is the only I really show live (we were short of time in the long meeting) . This runs on the 2nd core, is compiled in C++ and uploaded through the Arduino-IDE. I know there is a lot of information in this video, we will continue the series on the next meetings. Join us, our meetings are free and open.

        1. It can, but you’d lose the ability to monkey about with internals. The C runtime as implemented by an optimizing compiler is a little opaque. Forth is more useful when full understanding of the runtime, and know-how to circumvent or subvert it, is possible.

      1. I am trying to get good at x86 assembly. So I am building a calculator in it. Can someone provide some suggestions or what else should I learn?
        Profile photo for William Payne
        William Payne
        Former MTS, Retired at Sandia National Laboratories (1980–1992)59m
        gcc c a better choice than assembler for many reasons.

        Here is an example of a ones complement in gcc for 8, 16, 32, 64 and 128 bit platforms.

        Intel BASIC-52 constructs of ON GOSUB and ON GOTO have been implemented in gcc c..

        ll7 uses the ON GOTO construct.

        /gcc -o ee onec3.c
        gcc -g -c onec3.c
        objdump -d -M inte1 -S onec3.o
        */
        #include <stdio.h>
        int main()
        {
        static void
        array[]={&&ll0,&&ll1,&&ll2,&&ll128,&&ll4,&&ll5,&&ll6,&&ll7,
        &&ll8,&&ll16,&&ll32,&&ll64};
        unsigned char c[16] ;
        int ii;
        ii=5; goto ll0;
        ll5: ii=6; goto ll1;
        ll6: ii=7; goto ll128;
        ll7: ii=4; goto ll2;

        ll0: c[0]=0x00; c[1]=0xff ; c[2]=0x00 ; c[3]=0x00 ;
        c[4]=0x00 ; c[5]=0x00 ; c[6]=0x00 ; c[7]=0x00 ;
        c[8]=0x00 ; c[9]=0x00 ; c[10]=0x00 ; c[11]=0x00 ;
        c[12]=0x00 ; c[13]=0x00 ; c[14]=0x00 ; c[15]=0x00 ;
        goto *array[ii] ;

        ll1: printf(“\nOnes complement c[] = \n”) ;
        ll2: printf(“%2.2x%2.2x%2.2x%2.2x”, c[0],c[1],c[2],c[3]) ;
        printf(“:%2.2x%2.2x%2.2x%2.2x”, c[4],c[5],c[6],c[7]) ;
        printf(“:%2.2x%2.2x%2.2x%2.2x”, c[8],c[9],c[10],c[11]) ;
        printf(“:%2.2x%2.2x%2.2x%2.2x”, c[12],c[13],c[14],c[15]);
        printf(“\n”);
        goto *array[ii];

        ll128: c[15] = c[15]^0xff;
        c[14] = c[14]^0xff ;
        c[13] = c[13]^0xff ;
        c[12] = c[12]^0xff ;
        c[11] = c[11]^0xff ;
        c[10] = c[10]^0xff ;
        c[9] = c[9]^0xff ;
        ll64: c[8] = c[8]^0xff ;
        c[7] = c[7]^0xff ;
        c[6] = c[6]^0xff ;
        c[5] = c[5]^0xff ;
        c[4] = c[4]^0xff ;
        ll32: c[3] = c[3]^0xff ;
        c[2] = c[2]^0xff ;
        ll16: c[1] = c[1]^0xff ;
        ll8: c[0] = c[0]^0xff ;
        goto *array[ii];
        ll4: printf(“\n”);
        return 0;
        }
        lliam@william-Lenovo-IdeaPad-S145-15API:~$ cd Desktop
        william@william-Lenovo-IdeaPad-S145-15API:~/Desktop$ gcc -o ee onec3.c
        william@william-Lenovo-IdeaPad-S145-15API:~/Desktop$ ,/ee
        bash: ,/ee: No such file or directory
        william@william-Lenovo-IdeaPad-S145-15API:~/Desktop$ ./ee

        Ones complement c[] =
        00ff0000:00000000:00000000:00000000
        ff00ffff:ffffffff:ffffffff:ffffffff

  3. “we’ve seen some native RISC V implementations”

    Wait. So this is a Forth interpreter written in Python, running on a 1 GHz Risc-V? Suddenly, the 400 million Forth words per second claim sounds a little .. farfetched.

    1. Do you know a forth inner interpreter works? That part is always written in native assembly. Either direct threaded (easy on CISC) or indirect threaded. The fact that python is used in the compilation of the dictionary of words doesn’t affect run-time – it’s not a run-time activity.

      1. That part is always written in native assembly.

        That is wrong:

        Elrad 11/1982 page 36: Forth Simulator in Basic for Tandy and CBM
        Elrad 4/1984 page 42: Forth Simulator in ZX-Basic

        Oh..and I wrote a ST6 forth compiler many years ago when I was a student. (compiler, not interpreter!)

        :-p

        BTW: I am guessing there are more forth interpreter for thousand of different microcontroler and that is more than useful forth programm exist that everyone knows or uses. This language is a kind of intellectuale game. Looks funny, but do not use it for real work. I mean we live in a world where rust people explain us that C is a dangerous language that should not use anymore, :-D

        Olaf

        1. Some “real” programs, major ones, have been written in FORTH, like EasyWriter for Apple II and the IBM PC.
          EasyWriter’s author, John Draper, once explained its shortcomings as “FORTH makes it too easy to write quickly”.

      2. it is often written in assembly but it isn’t always. there are many paths of forth. my intuition aligns with BrightBlueJim’s — anyone whose personal taste leads them to python isn’t going to properly implement a compiled language and isn’t going to honestly report about their project on hackaday. but i’ll straight up say, that’s an ad hominem — i’m judging a technical fact by an unrelated bit of trivia.

        looking at the source, it seems like it’s interpretted, though. i’m surprised there’s now a python fork of pygmy forth but that seems to be what it is. the 16-bit x86 pygmy forth i have known and loved 25 years ago seems to be 25 years obsolete :)

        1. Bright Jim has posted the same question twice, and I answered on top. The civitek processor CV18000 has 4 CPUs, 2 RISCV that can run different sessions. The main processor call it #1 will run normally Linux, you can run on linux whatever Forth you like, one of them is mentioned in my talk , is the native binary Mecrisp and runs up to or even+ 400 MFW/second. (depending if you switch the I cache, depending on the length of your Forth word of course, but we are talking basically of Forth PRIMITIVES). You can also run PygmyForth if you wish or whatever you like, the advantage of PygmyForth is because we have Python with all libraries available on the IMG, so we can compile on board. The second processor can run RTOS which is a real time OS and there you can hang a binary of Forth also of high speed. There is a lot more I can explain, but we will show that on next Zoom sessions. Hope the confusion is cleared out.

          1. My apologies if I did it wrong, but as you can see, both posts were at exactly the same time, so I don’t think it’s anything I did. Anyway, the confusion was that I saw the python code, and read Al Williams’ implication that this WASN’T a RISC-V-native Forth machine, which led to the question. Thank you for the clarification.

    1. There were Forth implementations of “backwards chaining inference engines” in the 1980’s. They called them “Expert Systems” and GE had one for trouble-shooting their diesel-electric railroad locomotives. I think connected to a video disk with photos of what to do next.

  4. A lot of thanks to Hackaday and specially to @AI Williams for writing this article. This is a video of my presentation at this week Forth2020 Zoom meeting, we run monthly meetings and bimonthly presentations. This was the 50th edition of our meetings, and we had Chuck Moore participating on the conference, and chatting with the group. All who want to know about Forth programming are heartily invited to the Forth2020 group (on Facebook) and to our meetings as well. We talk about RiscV, Arm-Cortex, FPGAs, ESp32s, RPI-Picos, and everywhere we can install Forth like on hybrid computers with analog and digital technology was on the meeting ! — write down your questions I will be happy to help – If you want to download the IMG with 2 Forths + ArduinoIDE for the MilkV-64-Duo here is the repo : https://github.com/PeterForth/MILKV-FORTH

    1. Really not, this is 2 times faster than Teensy, has 64Megabytes, 256 MB or 512MB of Ram, and 3 cores… so you can run on 2 of them in parallel. Is a very powerful board. Not to mention, ++there is a trillion ops cpu (TPU) for video processing .

      1. One of the complaints I’ve heard about the GPUs on ARM-based computers is that the GPU code is provided only in binary blobs. Is this “TPU” documented well enough that bare-metal development for it can be done by anybody who feels like doing so?

        1. Hi Jim , https://github.com/milkv-duo/duo-files/blob/main/duo/datasheet/CV1800B-CV1801B-Preliminary-Datasheet-full-en.pdf , this is just a preliminary manual, I guess we can find a more up to date manual somewhere. the TPU list of commands is on chapter 8.1 . The board is actually supported by OpenCVmobile, Shuffle-Net, Yolo, all of these have a separate section covering step by step installation . There is a large list of applications over here to test TPU AI capabilities https://milkv.io/docs/duo/application-development/tpu/tpu-introduction . MilkV recommends to use the TDL SDK for development, which covers all instructions of the TPU. I have not started experimenting with the TPU , I am fully dedicated with Forth porting/support to the board.

          1. gcc c compiles labels as values may have a good idea for making a forth or Intel MCS BASIC-52 words/modules relocatable?

            gcc c requires that an array containing an indirect jump table be stored at the beginning of a program.

            Example:

            static void array[]={
            &&byte7, &&byte6, &&byte5, &&byte4, /
            0/
            &&byte3, &&byte2, &&byte1, &&byte0, /
            4/
            &&bit80, &&bit40, &&bit20, &&bit10, /
            8/
            &&bit80a, &&bit40a, &&bit20a, &&bit10a, /
            12/
            &&bit08, &&bit04, &&bit02, &&bit01, /
            16/
            &&bit08a, &&bit04a, &&bit02a, &&bit01a, /
            20/
            &&init, &&add, &&printa, &&printc, /
            24/
            &&zeros, &&l15, &&l14, &&l13, /
            28/
            &&l12, &&l11, &&l10, &&l9, /
            32/
            &&l8, &&l9, &&l6, &&l5, /
            36/
            &&l4, &&l3, &&l2, &&l1, /
            40/
            &&l0, &&done, &&mul, &&byte9, /
            44/
            &&bit01b, &&b70}; /
            48*/

            64 bit portable fix and floating point multiply referenced these label pointers.

            Forward immediate jumps can lead to non-relocatable binaries.

            Program modules reordered with only backward jumps.

            Curiosity prompted to see if gcc c would compile
            jumps without the above table. NO!!!

            Indirect fig Forth requires the physical address of the first
            instruction of machine code to be stored in memory.

            Possibility of storing a jump table before a machine code field with indirect or relative jumps only allowed?

            In this way words/modules in fig forth or BASIC-52 could be relocatable?

            ps gcc c goto *array[48] failed with a “core
            dump” using Ubuntu on an x86 platform.

            Issue was that &&bit01b was assigned offset of 44!

            gcc c did not process the line /44/. :(

  5. can fig forth output formatting do everything the c printf can?

    AI Overview.

    No, FIG-FORTH’s output formatting capabilities are less extensive
    than those of the C printf function, lacking features
    like format string arguments and the wide range of format specifiers.
    [hahaha?]

Leave a Reply to Coleman RobertsCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.