Gawking Hex Files

Last time I talked about how to use AWK (or, more probably the GNU AWK known as GAWK) to process text files. You might be thinking: why did I care? Hardware hackers don’t need text files, right? Maybe they do. I want to talk about a few common cases where AWK can process things that are more up the hardware hacker’s alley.

The Simple: Data Logs

If you look around, a lot of data loggers and test instruments do produce text files. If you have a text file from your scope or a program like SIGROK, it is simple to slice and dice it with AWK. Your machines might not always put out nicely formatted text files. That’s what AWK is for.

AWK makes the default assumption that fields break on whitespace and end with line feeds. However, you can change those assumptions in lots of ways. You can set FS and RS to change the field separator and record separator, respectively. Usually, you’ll set this in the BEGIN action although you can also change it on the command line.

For example, suppose your test file uses semicolons between fields. No problem. Just set FS to “;” and you are ready to go. Setting FS to a newline will treat the entire line as a single field. Instead of delimited fields, you might also run into fixed-width fields. For a file like that, you can set FIELDWIDTHS.

If the records aren’t delimited, but a fixed length, things are a bit trickier. Some people use the Linux utility dd to break the file apart into lines by the number of bytes in each record. You can also set RS to a limited number of any character and then use the RT variable (see below) to find out what those characters were. There are other options and even ways to read multiple lines. The GAWK manual is your friend if you have these cases.

BEGIN { RS=".{10}"   # records are 10 characters


   print $0  # do what you want here

Once you have records and fields sorted, it is easy to do things like average values, detect values that are out of limit, and just about anything else you can think of.

Spreadsheet Data Logs

Some tools output spreadsheets. AWK isn’t great at handling spreadsheets directly. However, a spreadsheet can be saved as a CSV file and then AWK can chew those up easily. It is also an easy format to produce from an AWK file that you can then read into a spreadsheet. You can then easily produce nice graphs, if you don’t want to use GNUPlot.

Simplistically, setting FS to a comma will do the job. If all you have is numbers, this is probably enough. If you have strings, though, some programs put quotes around strings (that may contain commas or spaces). Some only put quotes around strings that have commas in them.

To work around this problem cleanly, AWK offers an alternate way to define fields. Normally, FS tells you what characters separate a field. However, you can set FPAT to define what a field looks like. In the case of CSV file, a field is any character other than a comma or a double quote and then anything up to the next double quote.

The manual has a good example:

  FPAT = "([^,]+)|(\"[^\"]+\")"

  print "NF = ", NF
  for (i = 1; i <= NF; i++) {
  printf("$%d = <%s>\n", i, $i)

This isn’t perfect. For example, escaped quotes don’t work right. Quoted text with new lines in it don’t either. The manual has some changes that remove quotes and handle empty fields, but the example above works for most common cases. Often the easiest approach is to change the delimiter in the source program to something unique, instead of a comma.

Hex Files

Another text file common in hardware circles is a hex file. That is a text file that represents the hex contents of a programmable memory (perhaps embedded in a microcontroller). There are two common formats: Intel hex files and Motorola S records. AWK can handle both, but we’ll focus on the Intel variant.

Old versions of AWK didn’t work well with hex input, so you’d have to resort to building arrays to convert hex digits to numbers. You still see that sometimes in old code or code that strives to be compatible. However, GNU AWK has the strtonum function that explicitly converts a string to a number and understands the 0x prefix. So a highly compatible two digit hex function looks like this (not including the code to initialize the hexdigit array):

function hex2dec(x) {
  return (hexdigit[substr(x,1,1)]*16)+hexdigit[substr(x,2,1)]

If you don’t mind requiring GAWK, it can look like this:

function hex2dec(x) {
  return strtonum("0x" x);

In fact, the last function is a little better (and misnamed) because it can handle any hex number regardless of length (up to whatever limit is in GAWK).

Hex output is simple since you have printf and the X format specifier is available. Below is an AWK script that chews through a hex file and provides a count of the entire file, plus shows a breakdown of the segments (that is, non-contiguous memory regions).

BEGIN { ct=0;

function hex4dec(y) {
  return strtonum("0x" y)

function hex2dec(x) {
  return strtonum("0x" x);

/:[[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]]00/ {

  ad = hex4dec(substr($0, 4, 4))
  if (ad != adxpt) {
  block[++n] = ad
  adxpt = ad;
  l = hex2dec(substr($0, 2, 2))
  blockct[n] = blockct[n] + l
  adxpt = adxpt + l
  ct = ct + l

END { printf("Count=%d (0x%04x) bytes\t%d (0x%04x) words\n\n", ct, ct, ct/2, ct/2)
  for (i = 1 ; i <= n ; i++) {
  printf("%04x: %d (0x%x) bytes\t", block[i], blockct[i], blockct[i])
  printf("%d (0x%x) words\n", blockct[i]/2, blockct[i]/2)


This shows a few AWK features: the BEGIN action, user-defined functions, the use of named character classes (:xdigit: is a hex digit) and arrays (block and blockct use numeric indices even though they don’t have to). In the END action, the summary uses printf statements for both decimal and hex output.

Once you can parse a file like this, there are many things you could do with the resulting data. Here’s an example of some similar code that does a sanity check on hex files.

Binary Files

Text files are fine, but real hardware uses binary files that people (and AWK) can’t easily read, right? Well, maybe people, but AWK can read binary files in a few ways. You can use getline in the BEGIN part of the script and control how things are read directly. You can also use the RS/RT trick mentioned above to read a specific number of bytes. There are a few other AWK-only methods you can read about if you are interested.

However, the easiest way to deal with binary files in AWK is to convert them to text files using something like the od utility. This is a program available with Linux (or Cygwin, and probably other Windows toolkits) that converts a binary file to different readable formats. You probably want hex bytes, so that’s the -t x2 option (or use x4 for 16-bit words). However, the output is made for humans, not machines, so when a long run of the same output occurs, od omits them replacing all the missing lines with a single asterisk. For AWK use, you want to use the -v option to turn that behavior off. There are other options to change the output radix of the address, swap bytes, and more.

Here are a few lines from a random binary file:

0000000 d8ff e0ff 1000 464a 4649 0100 0001 0100
0000020 0100 0000 dbff 4300 5900 433d 434e 5938
0000040 484e 644e 595e 8569 90de 7a85 857a c2ff
0000060 a1cd ffde ffff ffff ffff ffff ffff ffff
0000100 ffff ffff ffff ffff ffff ffff ffff ffff
0000120 ffff ffff ffff ffff ffff 00db 0143 645e
0000140 8564 8575 90ff ff90 ffff ffff ffff ffff
0000160 ffff ffff ffff ffff ffff ffff ffff ffff
0000200 ffff ffff ffff ffff ffff ffff ffff ffff
0000220 ffff ffff ffff ffff ffff ffff ffff c0ff
0000240 1100 0108 02e0 0380 2201 0200 0111 1103
0000260 ff01 00c4 001f 0100 0105 0101 0101 0001
0000300 0000 0000 0000 0100 0302 0504 0706 0908
0000320 0b0a c4ff b500 0010 0102 0303 0402 0503
0000340 0405 0004 0100 017d 0302 0400 0511 2112
0000360 4131 1306 6151 2207 1471 8132 a191 2308

This is dead simple to parse with AWK. The address will be $1 and each field will be $2, $3, etc. You can just convert the file yourself, use a pipe in the shell, or–if you want a clean solution–have AWK run od as a subprocess. Since the input is text, all of AWK’s regular expression features still work, which is useful.

Writing binary files is easy, too, since printf can output nearly anything. An alternative is to use xxd instead of od. It can convert binary files to text, but also can do the reverse.

Full Languages

There’s an old saying that if all you have is a hammer, everything looks like a nail. I doubt that AWK is the best tool to build full languages, but it can be a component of some quick and dirty hacks. For example, the universal cross assembler uses AWK to transform assembly language files into an internal format the C preprocessor can handle

Since AWK can call out to external programs easily, it would be possible to write things that, for example, processed a text file of commands and used them to drive a robot arm. The regular expression matching makes text processing easy and external programs could actually handle the hardware interface.

awk-wolfensteinThink that’s far fetched? We’ve covered stranger AWK use cases, including a Wolfenstien-like game that uses 600 lines of AWK script (as seen to the right).

So, sure it is software, but it is a tool that has that Swiss Army knife quality that makes it a useful tool for software and hardware hackers alike. Of course, other tools like Perl, Python, and even C or C++ can do more. But often with a price in complexity and learning curve. AWK isn’t for every job, but when it works, it works well.

Join the GUI Generation: QTCreator

More and more projects require a software component these days. With everything being networked, it is getting harder to avoid having to provide software for a desktop or phone environment as well as the code in your embedded device.

If you’ve done a lot of embedded systems work, you probably already know C and C++. If so, it is pretty easy to grab up a C compiler and write a command-line application that does what you want. The problem is that today’s users have varying degrees of fear about the command line ranging from discomfort to sheer terror. On a mobile device, they probably don’t even know how to get to a command line. I’ve been waiting for years for the WIMP (Windows/Icon/Mouse/Pointer) fad to fade away, but even I have to admit that it is probably here for the foreseeable future.

qtrigolSo what’s the alternative? There are actually quite a few. However, I wanted to talk about one that is free, has a wide range of deployment options, uses C++, and is easy to pick up: Qt. Specifically, creating programs with QtCreator (see right). Yes, there are other options, and you can develop Qt programs in a number of ways.

You might think Qt isn’t free. There was a time that it was free for open source projects, but not for commercial projects. However, recent licensing changes (as of version 4.5) have made it more like using gcc. You can elect to use the LGPL which means it is easy to use the Qt shared libraries with closed software. You might also think that a lot of strange constructs that “extend” C++ in unusual ways. The truth is, it does, but with QtCreator, you probably won’t need to know anything about that since the tool will set up most, if not all, of that for you.


If you ever used Visual Basic or something similar, you will feel right at home with QtCreator. You can place buttons and text edit boxes and other widgets on a form and then back them up with code. Buttons create signals when you push them. There are lots of signals like text changed or widgets (controls) being created or destroyed.

To handle a signal, an object provides a slot. There is a meta-compiler that preprocesses your C++ code to get all the signal and slot stuff converted into regular C++. Here’s the good news: you don’t really care. In QtCreator you can write code to handle a button push and exactly how that happens isn’t really much of a worry.

QtCreator has kits that can target different platforms and — in general — the code is reasonably portable between platforms. If you do want to do mobile development for Android or iOS, be sure that you understand the limitations before you start so you can avoid future pitfalls.

You Need Class

Like many similar frameworks, Qt uses an application class (QtApplication) that represents a do-nothing application. Your job is to customize a subclass and have it do what you want. You add widgets and you can even add more screens, if you like. You can connect signals to existing slots or new slots.

There are many classes available, and the online documentation is quite good. Depending on which version of Qt you are using, you’ll need to find the right page (or ask QtCreator to find it for you). However, just to whet your appetite, here’s the Qt5 reference page. From there you can find classes for GUI widgets, strings, network sockets, database queries, and even serial ports.

I could do an entire tutorial on using QtCreator, but it would be a duplication of effort. There’s already a great getting started one provided. You’ll find there is plenty of documentation.


How do you enumerate serial ports? It depends on the platform, right? In Qt, the platform-specific part is hidden from you. For example, here’s a bit of code that fills in a combo box with the available serial port:

MainWindow::MainWindow(QWidget *parent) :
 ui(new Ui::MainWindow)
// initialize list of serial ports
 ports = QSerialPortInfo::availablePorts();
// fill in combo box
 for (int i=0; i<ports.length(); i++) 
   ui->comport->addItem(ports[i].portName(), QVariant(i));

The QSerialPortInfo object provides an array of serial port objects. The ui->comport is a combo box and the addItem method lets me put a display string and a data item in for each selection. In this case, the display is the portName of the port and the extra data is just the index in the array (as a variant, which could be different types of data, not just a number). When you select a port, the index lets the program look up the port to, for example, open it.

When the combo box changes, a currentIndexChanged signal will occur. Here’s the slot handler for that:

void MainWindow::on_comport_currentIndexChanged(int index)
 QString out;
 // get selected index
 int sel=ui->comport->currentData().toInt();
// build up HTML info string in out
 out="<h1>Serial Port Info</h1>";
 out += ports[sel].portName() + " " + ports[sel].description() + "
 out += ports[sel].systemLocation() + "
 if (ports[sel].hasVendorIdentifier() && ports[sel].hasProductIdentifier())
 out += ports[sel].manufacturer() + " ("+ QString::number(ports[sel].vendorIdentifier(),16) + ":" + QString::number(ports[sel].productIdentifier(),16) + ")";
 // and put it on the screen

In this case, the result is information about the serial port. You can see the resulting output, below. The QString is Qt’s string class and, obviously, the text display widget understands some HTML.


Not Just for GUIs

You can develop console applications using Qt, but then many of the provided classes don’t make sense. There’s even a Qt for Embedded (essentially Linux with no GUI). You can find guides for Raspberry Pi and BeagleBoard.

On the mobile side, you can target Android, iOS, and even Blackberry, along with others. Like anything, it probably won’t just be “push a button” and a ported application will fall out. But it still should cut down on development time and cost compared to rewriting a mobile app from scratch.

And the Winner Is…

I’m sure if you want some alternatives, our comment section is about to fill up with recommendations. Some of them are probably good. But it strikes me that not everyone has the same needs and background. The best tool for you might not work as well for me. I find Qt useful and productive.

Even if Qt isn’t your tool of choice, it still can be handy to have in your tool bag. You never know when you will need a quick and dirty cross-platform application.

Data Logging; Everyone’s Doing It, Why Aren’t You?

Between Tesla Motors’ automobiles and SpaceX’s rockets, Elon Musk’s engineers just have to be getting something right. In part, SpaceX’s success in landing their first stage rockets is due to analysis of telemetry data. You can see some of the data from their launch vehicles on the live videos and there is surely a lot more not shown.

An article in MIT Technology Review provides similar insights in how Tesla came from behind in autonomous vehicle operation by analyzing telemetry from their cars. Since 2014 their Model S received an increasing number of sensors that all report their data over the vehicle’s always-on cellular channel. Sterling Anderson of Tesla reported they get a million miles of data every 10 hours.

Image Credit Tesla
Image Credit Tesla

The same approach can help us to improve our systems but many believe creating a log of key data is costly in time and resources. If your system is perfect (HA HA!) that would be a valid assessment. All too often such data becomes priceless if analysis explains why your drone or robot wanted to go left into a building instead of right into the open field.

Continue reading “Data Logging; Everyone’s Doing It, Why Aren’t You?”

Talking Star Trek

Speech generation and recognition have come a long way. It wasn’t that long ago that we were in a breakfast place and endured 30 minutes of a teenaged girl screaming “CALL JUSTIN TAYLOR!” into her phone repeatedly, with no results. Now speech on phones is good enough you might never use the keyboard unless you want privacy. Every time we ask Google or Siri a question and get an answer it makes us feel like we are living in Star Trek.

[Smcameron] probably feels the same way. He’s been working on a Star Trek-inspired bridge simulator called “Space Nerds in Space” for some time. He decided to test out the current state of Linux speech support by adding speech commands and response to it. You can see the results in the video below.

Continue reading “Talking Star Trek”

Digitize Your Graphs With WebPlotDigitizer

Have you ever had to write a bit of code to interpret a non-linear analog reading as picked up by an ADC? When all you have to work with for your transfer function is a graph in a semiconductor datasheet that was probably written thirty years ago and prints out the size of a postage stamp, that’s a rather annoying task. Wouldn’t it be nice if you had access to the numbers behind the graph!

You can’t knock on the office door of the engineer who created it back in the ’80s, he’s probably  in retirement and playing golf or growing prize petunias by now. But you can digitize the graph to get yourself a lot closer to the action, and to help you in your quest there’s a handy online tool.

2N3904 current gain

WebPlotDigitizer is not new, it’s been around for quite a few years now. But it’s still worth talking about, because it’s one of those tools to keep in reserve. If you’ve ever needed it, you’ll know what we mean.

So how does it work? Load an image with a graph in it, select some points on the X and Y axis, roughly trace the curve with a marker tool, and set it in motion. Let’s give it a go. We’re going to try digitizing the current gain plot from the 2N3904 datasheet (PDF) that we examined a few days ago.

Data points!

So, open the WebPlotDigitizer app, load the graph image captured from the sheet as a JPEG. It asks what type of graph you’ve loaded, in this case a 2D X-Y plot. It asks you to identify four known points on the axes and supply their values. You also tell it if the axes are logarithmic at this point. Select “Automatic mode” on the right hand side, then click “Pen” and mark the graph trace, then select the colour of the trace. Click the “Run” button, and your data points appear. Hit the “View data” button, and there you have it. A few rogue points to remove perhaps, but it does a pretty good job.

If WebPlotDigitizer has engaged your interest, you’ll be pleased to know that it’s open-source, and you can find all its code on GitHub. There is also a handy video tutorial which you can see below the break. Continue reading “Digitize Your Graphs With WebPlotDigitizer”

OneSolver Does What Wolfram Can’t

Wolfram Alpha has been “helping” students get through higher math and science classes for years. It can do almost everything from solving Laplace transforms to various differential equations. It’s a little lacking when it comes to solving circuits, though, which is where [Grant] steps in. He’s come up with a tool called OneSolver which can help anyone work out a number of electrical circuits (and a few common physics problems, too).

[Grant] has been slowly building an online database of circuit designs that has gotten up to around a hundred unique solvers. The interesting thing is that the site implements a unique algorithm where all input fields of a circuits design can also become output fields. This is unique to most other online calculators because it lets you do things that circuit simulators and commercial math packages can’t. The framework defines one system of equations, and will solve all possible combinations, and lets one quickly home in on a desired design solution.

If you’re a student or someone who constantly builds regulators or other tiny circuits (probably most of us) then give this tool a shot. [Grant] is still adding to it, so it will only get better over time. This may be the first time we’ve seen something like this here, too, but there have been other more specific pieces of software to help out with your circuit design.


[glitch] had a cheap EPROM eraser with very few features. Actually, that might be giving it too much credit: it’s barely more than a UV light that turns on when it’s plugged in and turns off when it’s plugged out unplugged. Of course it would be nice to implement some safety features, so he decided he’d hook it up to a software-controlled power outlet.

Of course, controlling a relay that’s wired to mains is old hat around here, and in fact, we’ve covered [glitch]’s optoisolated mains switch already. He’s gone a little beyond the normal mains relay project with this one, though. Rather than use a microcontroller to run the relay, [glitch] wrote a simple Ruby script on his computer to turn the EPROM eraser on for the precise amount of time that is required to erase the memory.The Ruby script drives the relay control directly over a USB to serial adapter’s RTS handshake pin.

[glitch]’s hack reminds us that if you just need a quick couple bits of slow output, a USB-serial converter might be just the ticket. You could imagine driving everything from standard lamps to your 3D printer’s bed heater (provided you use similar hardware), but it’s especially helpful for [glitch] who claims to forget to turn off the eraser when it’s done its job, which leaves a potentially dangerous UV source just lying about. It’s always a good idea to add safety features to a dangerous piece of equipment!