ARM Programming By Example

The ARM processor is popping up everywhere. From Raspberry Pis, to phones, to Blue Pill Arduino-like boards, you don’t have to go far to find an ARM processor these days. If you program in C, you probably don’t care much or even think about it. But do you know ARM assembly language? Well, if you look at it one way, it can’t be too hard. The CPU only has about 30 distinct operations — that’s why it is called RISC. Of course, sometimes fewer instructions actually make things more difficult. But you can get a great starting tutorial with the 21 programs on the ARM Assembly by Example website.

You need a 32-bit ARMv6 or better — so Raspberry Pi will work here. The compiler, of course, is gcc and all the associated tools. if you have the right hardware, there are sections on using the floating point unit and the NEON co-processor, too.

The first few sections are what you might expect: program basics and a memory map tutorial. But after the obligatory stop at “Hello world”, you’ll find programs like “Find the Otter” and “Hex and Love” (see the video below) to challenge your burgeoning skill.

If you have interest in assembly language, it is an easy way to dip your toe into the ARM waters. If you are more interested in 64-bit Intel/AMD CPUs, we can help. If you want to go bare metal, we’ve been there, too.

40 thoughts on “ARM Programming By Example

        1. Tons of reasons write in assembly, not the least of which is that it’s fun. But there are more objective situations where it’s needed. Writing shellcode, for example; or writing compiler back-ends, JIT compilers, etc.

        1. Yes, both are true, depending on the point in history. While what you wrote is absolutely correct, it was common for people experienced with UNIX to refer to the C compiler ‘gcc’ as “GNU CC”, ‘cc’ being its closest UNIX equivalent. Over time, the collection of tools grew well beyond gcc, ld and even g++, ergo the rebranding in the late 90’s.

          The story actually has several interesting twists and turns. They’re easily googled if you’re curious.

  1. If you only ever learnt one assembly language this life time I would recommend ARM. It’s really quite an elegant instructions set and architecture. I like that it’s very easily (it’s hard not to) write memory position independent code. Also being able to use conditionals and bit-shifting for ‘free’ on a single instruction has been really nice (especially back in the day when resources were less available than today).

    (Programming in ARM since 1990)

    1. Same. VLSI ARM7500 chip set. I wrote a Forth interpreter and was really pleased with the efficiency and speed. The register set and the auto-increment and decrement, barrel shifter, and conditional execution of every instruction meant one or two instructions for many of the Forth machine operations. The interpreter overhead was 2 instructions. (I never used Thumb because it drops the conditional execution.)

      I’m inspired by this to do some assembly on the various Samsung and Rockchip boards I have.

  2. I haven’t watched the video, yet. I just might find it interesting. I learned assembly language on the C=64 because it would do things much quicker than basic. This article has prompted me to dust off those old brain cells and see what I can do with one of the ARM single boards I have lying around. Less than a month until SuperCon, so I better get cracking.

    1. The fact that the original ARM was designed by folks who were used to programming a 6502 (in a BBC Micro, but also used in your C-64) should make it a reasonably smooth transition, too.

      1. Completely agree. I was programming in 6502 on the Acorn/BBC Model B for a few years before switching to ARM on the Acorn Archimedes A310, which was a quick, easy and a pleasure (16 registers on the ARM2 compare to ~3 on the 6502!).

        Back when ARM stood for Acorn RISC Machine.

  3. The early PDP-11’s from DEC had very much a RISC instruction set; they dated from the early 70’s (about!), before the term, RISC, had been invented. This range of computers from DEC were used very widely in those days and were incorporated into lots of OEM offerings. The main limitation was address space, only 64K bytes. As well, the single set of registers became a restriction as the concept and implementation of RISC processors developed in the succeeding years.

    1. The PDP-11 is what I tend to call “MISC” for “Middled Instruction Set Computer”.

      It definitely has parts of it’s operation that are very RISC like, but at the same time it has discrete operations for *everything* you could possibly want, which would make it more CISC. They even had addon cards to do some operations more efficiently if you were an installation that wanted that instead of slower multi-OP functions.

      But it’s good, you get some of the advantages of both, reduced cycles per instruction, more registers, and small memory footprint for programs. (Remember, UNIX could run on them with only 64K!)

      A couple of corrections though… the address space was 4MB for the 11/70, it just needed the MMU turned on, and the later models had 3 independent sets of registers for user/super/kernel.

      I’ve had my head deep into the bowels of PDP-11 CPU operation from when I wrote my emulator for them… I don’t think I’ll forget that structure soon

      1. The remarks I made referred,as I said there, to the early PDP-11’s, the ones earlier than the 11/40. Those early models could address only 64k bytes and they didn’t have user/super/kernel mode, so just the single set of registers. DEC realised that the address space was a limitation on the design and implemented those features that you mentioned in the later models of the hardware.

        1. I misunderstood your early, I thought you mean that PDP-11s in general were early (as in old now), not that you specifically meant the 11/20.

          Though, iirc, that 64K (16-bit word not byte) limit and single mode was only before they switched to a microcode 11/20 instead of the original ones.

          The 11/20 microcode version came with up to 128KW, multiple operating modes, and MMU/MPU pretty much immediately after release.

          1. Nope, was 64KB, sorry…
            Having looked up some info, yeah the base spec 11/20 was grim. 12KW, 16-bit bus, only 8 lx1 registers…. But even at release they offered paging and 128KW, 18-bit bus if you could afford it.

    1. It would be useful to someone trying to follow along with these tutorials on their phone using termux – why not send the project author a note with your examples?

      I tried to do some of the examples myself last night, before realising it would not work quite the same way as it was written when using a phone…

      Thanks for your efforts here :-)

    1. In general, if you want to get assembly out of a compiled program, you need a disassembler.

      If you Google around, there are online AVR disassemblers. You can just upload your compiled hex file and it will spit out the assembly. The hex file should be in your sketch folder.

  4. One of the great (but destructive) programs to write is infinite recursion. Any program that calls itself will keep pushing return addresses on the stack, until all memory is consumed.
    Of course, since the ARM has memory mapped IO, the fun is seeing assorted peripherals go berserk when those locations get written to.

  5. Sigh. If only “ARM assembly” was a single thing, and not at least three vaguely related sets of instructions. The short example shown here won’t work on the popular Cortex-M0 microcontrollers, for example.

    1. Yeah, I’ve been writing some OS stuff for ARM boards and it’s annoying. ARM64 is different to V7, which is different to V6, and functionally different for Thumb 1 or 2….

      And then that the FP register can be different depending on hardware/compiler implementation.

      And of course some Cortex-M implementations don’t have to include all instructions or make them actually work.

      On Thumb 2 and regular V7 at least they use the same assembly language, so you only have to write it once and it then assembles into different OP codes depending on the processor required.

      But M0/M0+ is V6-thumb ARM so differs from all that and needs to be done specially.

    2. Looking at the actual article, the examples also seem to assume an operating system (linux.) “svc 0”, with appropriate register contents, to print a string to stdout, and the like. Still useful, but not for any of the microcontroller environments where (arguably) use of assembly lanaguage would yield the most benefit.

  6. I enjoy assembly language programming. Ever since college with the VAX, then 8086, Z80, 68XXX, Intel 32, 64 bit, and of course ARM. Mostly just drivers, graphic subroutines, startup code and such at the time. I do wish I could do more of it, but as we all know it is much quicker to get’er done with Python and C. That said, I have been dabbling with ARM64 over the past year for fun. Even wrote the simple game Mug Wumps in it. The thing about learning assembly is to have a ‘goal’ in mind (like a simple game or task). Writing a complete application is a lot different than reading about it in a book :) . Only problem is that your application isn’t ‘portable’ (like C or Python). So it goes. Still fun and interesting to some of us.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.