Learning X86_64 Assembly By Building A GUI From Scratch

Some professional coders are absolutely adamant that learning to program in assembly language in these modern times is simply a waste of time, and this post is not for them. This is for the rest of us, who still think there is value in knowing at a low level what is going on, a deeper appreciation can be developed. [Philippe Gaultier] was certainly in this latter camp and figured the best way to learn was to work on a substantial project.

Now, there are some valid reasons to write directly in assembler; for example hand-crafting unusual code sequences for creating software exploits would be hindered by an optimising compiler. Creating code optimised for speed and size is unlikely to be among those reasons, as doing a better job than a modern compiler would be a considerable challenge. Some folks would follow the tried and trusted route and work towards getting a “hello world!” output to the console or a serial port, but not [Philippe]. This project aimed to get a full-custom GUI application running as a client to the X11 server running atop Linux, but the theory should be good for any *nix OS.

Hello World! In X11!

The first part of the process was to create a valid ELF executable that Linux would work with. Using nasm to assemble and the standard linker, only a few X86_64 instructions are needed to create a tiny executable that just exits cleanly. Next, we learn how to manipulate the stack in order to set up a non-trivial system call that sends some text to the system STDOUT.

To perform any GUI actions, we must remember that X11 is a network-orientated system, where our executable will be a client connected via a socket. In the simple case, we just connect the locally created socket to the server path for the local X server, which is just a matter of populating the sockaddr_un data structure on the stack and calling the connect() system call.

Now the connection is made, we can follow the usual X11 route of creating client ids, then allocating resources using them. Next, fonts are opened, and a graphical context is created to use it to create a window. Finally, after mapping the window, something will be visible to draw into with subsequent commands. X11 is a low-level GUI system, so there are quite a few steps to create even the most simple drawable object, but this also makes it quite simple to understand and thus quite suited to such a project.

We applaud [Phillip] for the fabulous documentation of this learning hack and can’t wait to see what’s next in store!

Not too long ago, we covered Snowdrop OS, which is written entirely in X86 assembly, and we also found out a thing or two about some oddball X86 instructions. We’ve also done our own Linux assembly primer.

Well Documented Code Helps Revive Decades-Old Commodore Project

In the 1980s, [Mike] was working on his own RPG for the Commodore 64, inspired by dungeon crawlers of the era like Ultima IV and Telengard, both some of his favorites. The mechanics and gameplay were fairly revolutionary for the time, and [Mike] wanted to develop some of these ideas, especially the idea of line-of-sight, even further with his own game. But an illness, a stint in the military, and the rest of life since the 80s got in the way of finishing this project. This always nagged at him, so he finally dug out his decades-old project, dusted out his old Commodore and other antique equipment, and is hoping to finish it by 2024.

Luckily [Mike’s] younger self went to some extremes documenting the project, starting with a map he created which was inspired by Dungeons and Dragons. There are printed notes from a Commodore 64 printer, including all of the assembly instructions, augmented with his handwritten notes to explain how everything worked. He also has handwritten notes, including character set plans, disk sector use plans, menus, player commands, character stats, and equipment, all saved on paper. The early code was written using a machine language monitor since [Mike] didn’t know about the existence of assemblers at the time. Eventually, he discovered them and attempted to rebuild the code on a Commodore 128 and then an Amiga, but never got everything working together. There is some working code still on a floppy disk, but a lot of it doesn’t work together either.

While not quite finished yet, [Mike] has a well-thought-out plan for completing the build, involving aggregating all of the commented source code and doing quarterly sprints from here on out to attempt to get the project finished. We’re all excited to see how this project fares in the future. Beyond the huge scope of this pet project, we’d also suggest that this is an excellent example of thoroughly commenting one’s code to avoid having to solve mysteries or reinvent wheels when revisiting projects months (or decades) later. After all, self-documenting code doesn’t exist.

Continue reading “Well Documented Code Helps Revive Decades-Old Commodore Project”

Alternatives To Pins And Holes For 3D Printed Assemblies

When we have two 3D printed parts that need to fit together, many of us rely on pins and holes to locate them and fix them together. [Slant 3D] has explored some alternative ideas in this area that may open up new avenues for your own designs.

Their first idea was to simply chamfer the pins and holes. This allows the object to be printed in a different orientations without compromising the fit. It also makes the features less brittle and creates a broader surface for gluing. Another alternative is using fins and slots, which again add robustness compared to flimsy pins. By chamfering the edges of the fins, they can be printed vertically for good strength and easy location without the need for support material.

Neither option requires much extra fuss compared to typical pin-and-hole designs. Plus, both are far less likely to snap off and ruin your day. Be honest, we’ve all been there. Meanwhile, consider adding folded techniques to your repertoire, too.

Continue reading “Alternatives To Pins And Holes For 3D Printed Assemblies”

Videos Teach Bare Metal RP2040

When we write about retrocomputers, we realize that back in the day, people knew all the details of their computer. You had to, really, if you wanted to get anything done. These days, we more often pick peripherals and just assume our C or other high level code will fit and run on the CPU.

But sometimes you need to get down to the bare metal and if your desire is to use bare metal on the RP2040, [Will Thomas] has a YouTube channel to help you. The first video explains why you might want to do this followed by some simple examples. Then you’ll find over a dozen other videos that give you details.

Any video that starts, “Alright, Monday night. I have no friends. It is officially bare metal hours,” deserves your viewing. Of course, you have to start with the traditional blinking LED. But subsequent videos talk about the second core, GPIO, clocks, SRAM, spinlocks, the UART, and plenty more.

As you might expect, the code is all in assembly. But even if you want to program using C without the SDK, the examples will be invaluable. We like assembly — it is like working an intricate puzzle and getting anything to work is satisfying. We get it. But commercially, it rarely makes sense to use assembly anymore. On the other hand, when you need it, you really need it. Besides, we all do things for fun that don’t make sense commercially.

We like assembly, especially on platforms where most people don’t use it. Tackling it on a modern CPU is daunting, but if you want to have a go, we know someone who can help.

Continue reading “Videos Teach Bare Metal RP2040”

Bootstrapping The Old Fashioned Way

The PDP-11, the Altair 8800, and the IMSAI 8080 were some of the heroes of the computer revolution, and they have something in common — front panel switches, and a lot of them. You probably have a fuzzy idea about those switches, maybe from reading Levy’s Hackers, where the painful process of toggling in programs is briefly described. But how exactly does it work? Well thanks to [Dave Plummer] of Dave’s Garage, now we have a handy tutorial. The exact computer in question is a reproduction of the IMSAI 8080, the computer made famous by a young Matthew Broderick in Wargames. [Dave] managed to score the reproduction and a viewer saved him the time of assembly.

The example program is a Larson Scanner, AKA making an strip of lights push a pulse of light across the strip. [Dave] starts with the Assembly code, a scant 11 lines, and runs it through an assembler available online. That gives us machine code, but there’s no hex keypad for input, so we need those in 8-bit binary bytes. To actually program the machine, you set the address switches to your start-of-program location, and the data switches to your first byte. The “deposit” switch sets that byte, while the “deposit next” switch increments the address and then stores the value. It means you don’t have to key in an address for each instruction, just the data. Get to the end of the program, confirm the address is set to the start, and flick run. Hope you toggled everything in correctly. If so, you’re rewarded with a friendly scanner so reminiscent of 80s TV shows. Stick around after the break to see the demonstration!
Continue reading “Bootstrapping The Old Fashioned Way”

Homebrew An OS From Scratch? Snowdrop Shows How It’s Done

Ever wondered what it would take to roll your own OS? [Sebastian]’s Snowdrop OS might just provide you with some insight into that process, and maybe even some inspiration.

[Sebastian] created Snowdrop completely from scratch, using only x86 assembly language. It’s more than just bare-bones, and boasts a number of useful utilities and programs including a BASIC interpreter and linker (for creating standalone BASIC executables.) That’s not even touching on the useful essentials, like multitasking and a GUI framework. There are even a number of resources specifically for making game development easier. Because as [Sebastian] puts it, what’s a operating system without games?

Interested in giving Snowdrop a try, or peek at the source code? The binaries and sources section has all you need, and the other headings at the top of the page will send you to the various related goodies. If you have a few minutes, we recommend you watch a walkthrough of the various elements and features of Snowdrop in this video tour (embedded after the page break.)

Snowdrop is an ambitious project, but we’re not surprised that [Sebastian] has made it work; we’ve seen his low-level software skills before, with his fantastic efforts around the classic stand-up arcade game, Knights of the Round.

Continue reading “Homebrew An OS From Scratch? Snowdrop Shows How It’s Done”

AVX-512: When The Bits Really Count

For the majority of workloads, fiddling with assembly instructions isn’t worth it. The added complexity and code obfuscation generally outweigh the relatively modest gains. Mainly because compilers have become quite fantastic at generation code and because processors are just so much faster, it is hard to get a meaningful speedup by tweaking a small section of code. That changes when you introduce SIMD instructions and need to decode lots of bitsets fast. Intel’s fancy AVX-512 SIMD instructions can offer some meaningful performance gains with relatively low custom assembly.

Like many software engineers, [Daniel Lemire] had many bitsets (a range of ints/enums encoded into a binary number, each bit corresponding to a different integer or enum). Rather than checking if just a specific flag is present (a bitwise and), [Daniel] wanted to know all the flags in a given bitset. The easiest way would be to iterate through all of them like so:

while (word != 0) {
  result[i] = trailingzeroes(word);
  word = word & (word - 1);
  i++;
}

The naive version of this look is very likely to have a branch misprediction, and either you or the compiler would speed it up by unrolling the loop. However, the AVX-512 instruction set on the latest Intel processors has some handy instructions just for this kind of thing. The instruction is vpcompressd and Intel provides a handy and memorable C/C++ function called _mm512_mask_compressstoreu_epi32.

The function generates an array of integers and you can use the infamous popcnt instruction to get the number of ones. Some early benchmark testing shows the AVX-512 version uses 45% fewer cycles. You might be wondering, doesn’t the processor downclock when wide 512-bite registers are used? Yes. But even with the downclocking, the SIMD version is still 33% faster. The code is up on Github if you want to try it yourself.