gcc: Some Assembly Required

There was a time when you pretty much had to be an assembly language programmer to work with embedded systems. Yes, there have always been high-level languages available, but it took improvements in tools and processors for that to make sense for anything but the simplest projects. Today, though, high-quality compilers are readily available for a lot of languages and even an inexpensive CPU is likely to outperform even desktop computers that many of us have used.

So assembly language is dead, right? Not exactly. There are several reasons people still want to use assembly. First, sometimes you need to get every clock cycle of performance out of a chip. It can be the case that a smart compiler will often produce better code than a person will write off the cuff. However, a smart person who is looking at performance can usually find a way to beat a compiler’s generated code. Besides, people can make value trades of speed versus space, for example, or pick entirely different algorithms. All a compiler can do is convert your code over as cleverly as possible.

Besides that, some people just like to program in assembly. Morse code, bows and arrows, and steam engines are all archaic, but there are still people who enjoy mastering them anyway. If you fall into that category, you might just want to write everything in assembly (and that’s fine). Most people, though, would prefer to work with something at a higher level and then slip into assembly just for that critical pieces. For example, a program might spend 5% of its time reading data, 5% of its time writing data, and 90% of the time crunching data. You probably don’t need to recreate the reading and writing parts. They won’t go to zero, after all, and so even if you could cut them in half (and you probably can’t) you get a 2.5% boost for each one. That 90% section is the big target.

The Profiler

Sometimes it is obvious what’s taking time in your programs. When it isn’t, you can actually turn on profiling. If you are running GCC under Linux, for example, you can use the -pg option to have GCC add profiling instrumentation to your code automatically. When you compile the code with -pg, it doesn’t appear to do anything different. You run your program as usual. However, the program will now silently write a file named gmon.out during execution. This file contains execution statistics you can display using gprof (see partial output below). The function b_fact takes up 65.9% of CPU time.


If you don’t have a profiling option for your environment, you might have to resort to toggling I/O pins or writing to a serial port to get an idea of how long your code spends doing different functions. However you do it, though, it is important to figure it out so you don’t waste time optimizing code that doesn’t really affect overall performance (this is good advice, by the way, for any kind of optimization).


If you start with a C or C++ program, one thing you can do is ask the compiler to output assembly language for you. With GCC, use a file name like test.s with the -o option and then use -S to force assembly language output. The output isn’t great, but it is readable. You can also use the -ahl option to get assembly code mixed with source code in comments, which is useful.

You can use this trick with most, if not all, versions of GCC. Of course, the output will be a lot different, depending. A 32-bit Linux compiler, a 64-bit Linux compiler, a Raspberry Pi compiler, and an Arduino compiler are all going to have very different output. Also, you can’t always figure out how the compiler mangles your code, so that is another problem.

If you find a function or section of code you want to rewrite, you can still use GCC and just stick the assembly language inline. Exactly how that works depends on what platform you use, but in general, GCC will send a string inside asm() or __asm__() to the system assembler. There are rules about how to interact with the rest of the C program, too. Here’s a simple example from the a GCC HOWTO document (from a PC program):

__asm__ ("movl %eax, %ebx\n\t"
"movl $56, %esi\n\t"
"movl %ecx, $label(%edx,%ebx,$4)\n\t"
"movb %ah, (%ebx)");

You can also use extended assembly that lets you use placeholders for parts of the C code. You can read more about that in the HOWTO document. If you prefer Arduino, there’s a document for that, too. If you are on ARM (like a Raspberry Pi) you might prefer to start with this document.


You may never need to mix assembly language with C code. But if you do, it is good to know it is possible and maybe not even too difficult. You do need to find what parts of your program can benefit from the effort. Even if you aren’t using GCC, there is probably a way to mix assembly and your language, you just have to learn how. You also have to learn the particulars of your platform.

On the other hand, what if you want to write an entire program in assembly? That’s even more platform-specific, but we’ll look at that next time.

Embed With Elliot: March Makefile Madness

The make tool turns the big 4-0 next month, and we thought we’d start up the festivities early. In a two-part series, I’ll cover some of the make background that I think is particularly useful, and then focus on microcontroller-specific applications. If you’re still cut-and-pasting a general purpose makefile to run your toolchain, hopefully you’ll get enough insight here to start rolling your own. It can be a lot simpler than it appears!

Just as soon as the C programming language was invented, and projects started to get a little bit bigger than a “hello world”, it became obvious that some tool was needed to organize and automate compilation. After all, if you’ve got a program that’s spread over a number of files, modules, or libraries, it’s a hassle to have to re-compile them all any time you make a change to just a single section of code. If some parts haven’t changed, you’re just wasting time by re-compiling them. But who can keep track of all of this? Make can!

Continue reading “Embed With Elliot: March Makefile Madness”

Build Your Own CPU? That’s the Easy Part!

You want to build your own CPU? That’s great fun, but you might find it isn’t as hard as you think. I’ve done several CPUs over the years, and there’s no shortage of other custom CPUs out there ranging from pretty serious attempts to computers made out of discrete chips to computers made with relays. Not to trivialize the attempt, but the real problem isn’t the CPU. It is the infrastructure.

What Kind of Infrastructure?

I suppose the holy grail would be to bootstrap your custom CPU into a full-blown Linux system. That’s a big enough job that I haven’t done it. Although you might be more productive than I am, you probably need a certain amount of sleep, and so you may want to consider if you can really get it all done in a reasonable time. Many custom CPUs, for example, don’t run interactive operating systems (or any operating system, for that matter). In extreme cases, custom CPUs don’t have any infrastructure and you program them in straight machine code.

Machine code is error prone so, you really need an assembler. If you are working on a big machine, you might even want a linker. Assembly language coding gets tedious after a while, so maybe you want a C compiler (or some other language). A debugger? What about an operating system?

Each one of those things is a pretty serious project all by itself (on top of the project of making a fairly capable CPU). Unless you have a lot of free time on your hands or a big team, you are going to have to consider how to hack some shortcuts.

Continue reading “Build Your Own CPU? That’s the Easy Part!”

GCC for the ESP8266 WiFi Module

When we first heard about it a few weeks ago, we knew the ESP8266 UART to WiFi module was a special beast. It was cheap, gave every microcontroller the ability to connect to a WiFi network, and could – possibly – be programmed itself, turning this little module into a complete Internet of Things solution. The only thing preventing the last feature from being realized was the lack of compiler support. This has now changed. The officially unofficial ESP8266 community forums now has a working GCC for the ESP8266.

The ESP8266 most people are getting from China features a Tensilica Xtensa LX3 32-bit SOC clocked at 80 MHz. There’s an SPI flash on the board, containing a few dozen kilobytes of data. Most of this, of course, is the code to run the TCP/IP stack and manage the radio. There are a few k left over – and a few pins – for anyone to add some code and some extended functionality to this module. With the work on GCC for this module, it’ll be just a few days until someone manages to get the most basic project running on this module. By next week, someone will have a video of this module connected to a battery, with a web-enabled blinking LED.

Of course that’s not the only thing this module can do; at less than $5, it will only be a matter of time until sensors are wired in, code written, and a truly affordable IoT sensor platform is created.

If you have a few of these modules sitting around and you’d like to give the new compiler a go, the git is right here.

Trimming The Fat From AVR GCC


[Ralph] has been working on an extraordinarily tiny bootloader for the ATtiny85, and although coding in assembly does have some merits in this regard, writing in C and using AVR Libc is so much more convenient. Through his trials of slimming down pieces of code to the bare minimum, he’s found a few ways to easily trim a few bytes off code compiled with AVR-GCC.

To test his ideas out, [Ralph] first coded up a short program that reads the ATtiny85’s internal temperature sensor. Dissassembling the code, he found the a jump to a function called __ctors_end: before the jump to main. According to the ATtiny85 datasheet, this call sets the IO registers to their initial values. These initial values are 0, so that’s 16 bytes that can be saved. This function also sets the stack pointer to its initial value, so another 16 bytes can be optimized out.

If you’re not using interrupts on an ATtiny, you can get rid of 30 bytes of code by getting rid of the interrupt vector table. In the end, [Ralph] was able to take a 274 byte program and trim it down to 190 bytes. Compared to the 8k of Flash on the ‘tiny85, it’s a small amount saved, but if you’re banging your head against the limitations of this micro’s storage, this might be a good place to start.

Now if you want to hear some stories about optimizing code you’ve got to check out the Once Upon Atari documentary. They spent months hand optimizing code to make it fit on the cartridges.

Using newlib with Stellaris Launchpad


[Brandon] is taking us further down the rabbit hole by demonstrating how to use newlib with the TI Stellaris Launchpad. This is a nice continuation of the framework he built with his post about using GCC with ARM hardware. But it is most certainly one level of complexity deeper than that initial article.

Using newlib instead of glibc offers the option of compiling C code that includes system calls common when coding for computers but which are rare in embedded systems. Using something like printf is generally avoided because of the overhead associated with it. But these processors are getting so fast and have so much RAM that it may be useful in certain cases. We briefly thought about implementing malloc for creating a linked list when working on our STM32 snake game. [Brandon’s] work here makes the use of that command possible.

The process starts by adding labels for the beginning and end of the stack/heap. This makes it possible for functions to allocate memory. After taking care of the linker script changes you must implement a few system call functions like _sbrk.

A study of GCC and the TI Stellaris


There are several things that we really like about the TI Stellaris. We think the peripheral library — called Stellarisware — has a pretty intuitive API that makes it easy to get into. But we’re also quite impressed that the software comes with makefiles that build the libraries and examples using your own GCC cross compiling toolchain. We spent quite a bit of time pawing through those makefiles and the makedefs settings file to figure out how TI was doing things. Now if you don’t want to do that sleuthing yourself you can head on over to the GCC with TI Stellaris Launchpad guide which [Brandon] just published.

Shown above is the helpful chart of compiler flags which he pulled from the files with his added comments on what each does. He did the same for the linker flags, and then discusses the program calls made during compilation and linking. He then delves into how the driver library on the chip’s ROM can be accessed in code. This is just the first in a four-part series he plans to write. We can’t wait to see what he has to say about the hardware FPU as we haven’t had time to explore that for ourselves quite yet.