Gawking Hex Files

Last time I talked about how to use AWK (or, more probably the GNU AWK known as GAWK) to process text files. You might be thinking: why did I care? Hardware hackers don’t need text files, right? Maybe they do. I want to talk about a few common cases where AWK can process things that are more up the hardware hacker’s alley.

The Simple: Data Logs

If you look around, a lot of data loggers and test instruments do produce text files. If you have a text file from your scope or a program like SIGROK, it is simple to slice and dice it with AWK. Your machines might not always put out nicely formatted text files. That’s what AWK is for.

AWK makes the default assumption that fields break on whitespace and end with line feeds. However, you can change those assumptions in lots of ways. You can set FS and RS to change the field separator and record separator, respectively. Usually, you’ll set this in the BEGIN action although you can also change it on the command line.

For example, suppose your test file uses semicolons between fields. No problem. Just set FS to “;” and you are ready to go. Setting FS to a newline will treat the entire line as a single field. Instead of delimited fields, you might also run into fixed-width fields. For a file like that, you can set FIELDWIDTHS.

If the records aren’t delimited, but a fixed length, things are a bit trickier. Some people use the Linux utility dd to break the file apart into lines by the number of bytes in each record. You can also set RS to a limited number of any character and then use the RT variable (see below) to find out what those characters were. There are other options and even ways to read multiple lines. The GAWK manual is your friend if you have these cases.

BEGIN { RS=".{10}"   # records are 10 characters
      }

   {
   $0=RT
   }

   {
   print $0  # do what you want here
   }

Once you have records and fields sorted, it is easy to do things like average values, detect values that are out of limit, and just about anything else you can think of.

Spreadsheet Data Logs

Some tools output spreadsheets. AWK isn’t great at handling spreadsheets directly. However, a spreadsheet can be saved as a CSV file and then AWK can chew those up easily. It is also an easy format to produce from an AWK file that you can then read into a spreadsheet. You can then easily produce nice graphs, if you don’t want to use GNUPlot.

Simplistically, setting FS to a comma will do the job. If all you have is numbers, this is probably enough. If you have strings, though, some programs put quotes around strings (that may contain commas or spaces). Some only put quotes around strings that have commas in them.

To work around this problem cleanly, AWK offers an alternate way to define fields. Normally, FS tells you what characters separate a field. However, you can set FPAT to define what a field looks like. In the case of CSV file, a field is any character other than a comma or a double quote and then anything up to the next double quote.

The manual has a good example:

BEGIN {
  FPAT = "([^,]+)|(\"[^\"]+\")"
  }

  {
  print "NF = ", NF
  for (i = 1; i <= NF; i++) {
  printf("$%d = <%s>\n", i, $i)
  }

This isn’t perfect. For example, escaped quotes don’t work right. Quoted text with new lines in it don’t either. The manual has some changes that remove quotes and handle empty fields, but the example above works for most common cases. Often the easiest approach is to change the delimiter in the source program to something unique, instead of a comma.

Hex Files

Another text file common in hardware circles is a hex file. That is a text file that represents the hex contents of a programmable memory (perhaps embedded in a microcontroller). There are two common formats: Intel hex files and Motorola S records. AWK can handle both, but we’ll focus on the Intel variant.

Old versions of AWK didn’t work well with hex input, so you’d have to resort to building arrays to convert hex digits to numbers. You still see that sometimes in old code or code that strives to be compatible. However, GNU AWK has the strtonum function that explicitly converts a string to a number and understands the 0x prefix. So a highly compatible two digit hex function looks like this (not including the code to initialize the hexdigit array):

function hex2dec(x) {
  return (hexdigit[substr(x,1,1)]*16)+hexdigit[substr(x,2,1)]
}

If you don’t mind requiring GAWK, it can look like this:

function hex2dec(x) {
  return strtonum("0x" x);
}

In fact, the last function is a little better (and misnamed) because it can handle any hex number regardless of length (up to whatever limit is in GAWK).

Hex output is simple since you have printf and the X format specifier is available. Below is an AWK script that chews through a hex file and provides a count of the entire file, plus shows a breakdown of the segments (that is, non-contiguous memory regions).

BEGIN { ct=0;
  adxpt=""
}


function hex4dec(y) {
  return strtonum("0x" y)
}


function hex2dec(x) {
  return strtonum("0x" x);
}

/:[[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]]00/ {

  ad = hex4dec(substr($0, 4, 4))
  if (ad != adxpt) {
  block[++n] = ad
  adxpt = ad;
    }
  l = hex2dec(substr($0, 2, 2))
  blockct[n] = blockct[n] + l
  adxpt = adxpt + l
  ct = ct + l
  }

END { printf("Count=%d (0x%04x) bytes\t%d (0x%04x) words\n\n", ct, ct, ct/2, ct/2)
  for (i = 1 ; i <= n ; i++) {
  printf("%04x: %d (0x%x) bytes\t", block[i], blockct[i], blockct[i])
  printf("%d (0x%x) words\n", blockct[i]/2, blockct[i]/2)
  }

}

This shows a few AWK features: the BEGIN action, user-defined functions, the use of named character classes (:xdigit: is a hex digit) and arrays (block and blockct use numeric indices even though they don’t have to). In the END action, the summary uses printf statements for both decimal and hex output.

Once you can parse a file like this, there are many things you could do with the resulting data. Here’s an example of some similar code that does a sanity check on hex files.

Binary Files

Text files are fine, but real hardware uses binary files that people (and AWK) can’t easily read, right? Well, maybe people, but AWK can read binary files in a few ways. You can use getline in the BEGIN part of the script and control how things are read directly. You can also use the RS/RT trick mentioned above to read a specific number of bytes. There are a few other AWK-only methods you can read about if you are interested.

However, the easiest way to deal with binary files in AWK is to convert them to text files using something like the od utility. This is a program available with Linux (or Cygwin, and probably other Windows toolkits) that converts a binary file to different readable formats. You probably want hex bytes, so that’s the -t x2 option (or use x4 for 16-bit words). However, the output is made for humans, not machines, so when a long run of the same output occurs, od omits them replacing all the missing lines with a single asterisk. For AWK use, you want to use the -v option to turn that behavior off. There are other options to change the output radix of the address, swap bytes, and more.

Here are a few lines from a random binary file:

0000000 d8ff e0ff 1000 464a 4649 0100 0001 0100
0000020 0100 0000 dbff 4300 5900 433d 434e 5938
0000040 484e 644e 595e 8569 90de 7a85 857a c2ff
0000060 a1cd ffde ffff ffff ffff ffff ffff ffff
0000100 ffff ffff ffff ffff ffff ffff ffff ffff
0000120 ffff ffff ffff ffff ffff 00db 0143 645e
0000140 8564 8575 90ff ff90 ffff ffff ffff ffff
0000160 ffff ffff ffff ffff ffff ffff ffff ffff
0000200 ffff ffff ffff ffff ffff ffff ffff ffff
0000220 ffff ffff ffff ffff ffff ffff ffff c0ff
0000240 1100 0108 02e0 0380 2201 0200 0111 1103
0000260 ff01 00c4 001f 0100 0105 0101 0101 0001
0000300 0000 0000 0000 0100 0302 0504 0706 0908
0000320 0b0a c4ff b500 0010 0102 0303 0402 0503
0000340 0405 0004 0100 017d 0302 0400 0511 2112
0000360 4131 1306 6151 2207 1471 8132 a191 2308

This is dead simple to parse with AWK. The address will be $1 and each field will be $2, $3, etc. You can just convert the file yourself, use a pipe in the shell, or–if you want a clean solution–have AWK run od as a subprocess. Since the input is text, all of AWK’s regular expression features still work, which is useful.

Writing binary files is easy, too, since printf can output nearly anything. An alternative is to use xxd instead of od. It can convert binary files to text, but also can do the reverse.

Full Languages

There’s an old saying that if all you have is a hammer, everything looks like a nail. I doubt that AWK is the best tool to build full languages, but it can be a component of some quick and dirty hacks. For example, the universal cross assembler uses AWK to transform assembly language files into an internal format the C preprocessor can handle

Since AWK can call out to external programs easily, it would be possible to write things that, for example, processed a text file of commands and used them to drive a robot arm. The regular expression matching makes text processing easy and external programs could actually handle the hardware interface.

awk-wolfensteinThink that’s far fetched? We’ve covered stranger AWK use cases, including a Wolfenstien-like game that uses 600 lines of AWK script (as seen to the right).

So, sure it is software, but it is a tool that has that Swiss Army knife quality that makes it a useful tool for software and hardware hackers alike. Of course, other tools like Perl, Python, and even C or C++ can do more. But often with a price in complexity and learning curve. AWK isn’t for every job, but when it works, it works well.

Puzzling Out An 80s Puzzle Toy

[Ido Gendel] looks back a time in the 80s when kids would learn by answering the questions to quizzes on their “TOMY Teacher,” or, “Sears Quiz-A-Tron”. There’s a bit of a conundrum with this toy. How did it know which answers were correct. Chip memory of any kind wasn’t the kind of thing you’d sweep into the dust bin if you had extras like it is now; it was expensive.

To use the toy, the child would place the notebook in the plastic frame on the device. They’d open the page with the quiz they would like to take. Printed in the upper left hand corner were three colored squares. There was a matching set of colored buttons on the device. They’d press the corresponding buttons in order from top to bottom and then the machine would magically know which answers on the quiz were correct.

[Ido] wondered how the machine handled this information. Was there an internal table for all 27 possible codes? Did it generate the answer table somehow? He sat down with a spreadsheet filled with the notebook code on the left and the corresponding correct answers on the right. Next he stared at the numbers.

He eventually determined that there was a pattern. The machine was using the colored squares as the input for a function that determined what the answers were. A table would have only taken up 68 bytes, but with one 80s chip on board, sounds to play, and lights to switch on and off, the machine needed all the free space it could get.

Learn To Program With Literate Programming

My heyday in programming was about five years ago, and I’ve really let my skills fade. I started finding myself making excuses for my lack of ability. I’d tackle harder ways to work around problems just so I wouldn’t have to code. Worst of all, I’d find myself shelving projects because I no longer enjoyed coding enough to do that portion. So I decided to put in the time and get back up to speed.

Normally, I’d get back into programming out of necessity. I’d go on a coding binge, read a lot of documentation, and cut and paste a lot of code. It works, but I’d end up with a really mixed understanding of what I did to get the working code. This time I wanted to structure my learning so I’d end up with a more, well, structured understanding.

However, there’s a problem. Programming books are universally boring. I own a really big pile of them, and that’s after I gave a bunch away. It’s not really the fault of the writer; it’s an awkward subject to teach. It usually starts off by torturing the reader with a chapter or two of painfully basic concepts with just enough arcana sprinkled in to massage a migraine into existence. Typically they also like to mention that the arcana will be demystified in another chapter. The next step is to make you play typist and transcribe a big block of code with new and interesting bits into an editor and run it. Presumably, the act of typing along leaves the reader with such a burning curiosity that the next seventeen pages of dry monologue about the thirteen lines of code are transformed into riveting prose within the reader’s mind. Maybe a structured understanding just isn’t worth it.

I wanted to find a new way to study programming. One where I could interact with the example code as I typed it. I wanted to end up with a full understanding before I pressed that run button for the first time, not after.

When I first read about literate programming, my very first instinct said: “nope, not doing that.” Donald Knuth, who is no small name in computing, proposes a new way of doing things in his Literate Programming. Rather than writing the code in the order the compiler likes to see it, write the code in the order you’d like to think about it along with a constant narrative about your thoughts while you’re developing it. The method by which he’d like people to achieve this feat is with the extensive use of macros. So, for example, a literate program would start with a section like this:

Continue reading “Learn To Program With Literate Programming”

Open Robots With Open Roberta

Kids, and Hackaday editors, love robots! The Open Roberta project (OR) takes advantage of this to teach kids about programming. And while the main focus is building a robot programming language that works for teaching grade-school and high-school kids, it’s also a part of a large open source robotics ecosystem that brings a lot more to the table than you might think. We talked with some folks at Google, one of the projects’ sponsors, about where the project is and where it’s going.

csm_Roberta_9e1215fc57Building a robot can be very simple — assembling pre-configured parts or building something small, quick, and cute — or it can be an endeavour that takes years of sweat and tears. Either way, the skills involved in building the ‘bot aren’t necessarily the same as those it takes to program the firmware that drives it, and then eventually the higher-level software that makes it functional and easy to drive.

OR, as an educational project, makes it very, very easy for kids to start off programming robots, but it’s expandable as the user gets more experienced. And since everything is open source, it’s part of a whole ecosystem that makes it even more valuable. We think it’s worth a look (along with something significantly more complex like ROS) if you’re playing around with robotics.

System Architecture

openRoberta.dotOpen Roberta is the user-facing middleware in a chain of software and firmware bits that make a robot work in a classroom environment. For the students, everything runs inside a browser. OR provides a webserver, robot programming interface and language, and then converts the output of the students’ programs to something that can be used with the robots’ firmware. The robots that are used in classrooms are mostly based on the Lego Mindstorms EV3 platform because it’s easy to put something together in short order. (But if you don’t have an EV3, don’t despair and read on!)

The emphasis is on ease of entry for the students and the teachers supervising the class. Everything runs in a browser, so there’s nothing to install on the client side. The students connect to a server that directs the robots, communicating with the robots’ own operating system, and uploading the students’ programs.

Continue reading “Open Robots With Open Roberta”

Continuing The Dialog: “It’s Time Software People And Mechanical People Had A Talk”

A while back I wrote a piece titled, “It’s Time the Software People and Mechanical People Sat Down and Had a Talk“. It was mostly a reaction to what I believe to be a growing problem in the hacker community. Bad mechanical designs get passed on by what is essentially digital word of mouth. A sort of mythology grows around these bad designs, and they start to separate from science. Rather than combat this, people tend to defend them much like one would defend a favorite band or a painting. This comes out of various ignorance, which were covered in more detail in the original article.

There was an excellent discussion in the comments, which reaffirmed why I like writing for Hackaday so much. You guys seriously rock. After reading through the comments and thinking about it, some of my views have changed. Some have stayed the same.

It has nothing to do with software guys.

being-wrong-quoteI definitely made a cognitive error. I think a lot of people who get into hardware hacking from the hobby world have a beginning in software. It makes sense, they’re already reading blogs like this one. Maybe they buy an Arduino and start messing around. It’s not long before they buy a 3D printer, and then naturally want to contribute back.

Since a larger portion of amateur mechanical designers come from software, it would make sense that when I had a bad interaction with someone over a design critique, they would be end up coming at it from a software perspective. So with a sample size too small, that didn’t fully take into account my positive interactions along with the negative ones, I made a false generalization. Sorry. When I sat down to think about it, I could easily have written an article titled, “It’s time the amateur mechanical designers and the professionals had a talk.” with the same point at the end.

Though, the part about hardware costs still applies.

I started out rather aggressively by stating that software people don’t understand the cost of physical things. I would, change that to: “anyone who hasn’t designed a physical product from napkin to market doesn’t understand the cost of things.”

Continue reading “Continuing The Dialog: “It’s Time Software People And Mechanical People Had A Talk””

It’s Time The Software People And Mechanical People Sat Down And Had A Talk.

With the advances in rapid prototyping, there’s been a huge influx of people in the physical realm of hacking. While my overall view of this development is positive, I’ve noticed a schism forming in the community. I’m going to have to call a group out. I think it stems from a fundamental refusal of software folks to change their ways of thinking to some of the real aspects of working in the physical realm, so-to-speak. The problem, I think, comes down to three things: dismissal of cost, favoring modularity over understanding, and a resulting insistence that there’s nothing to learn.

Continue reading “It’s Time The Software People And Mechanical People Sat Down And Had A Talk.”

Test Drive Your New Programming Font

After hour and hours spent in front of a terminal or IDE, a user begins to build a list of infuriating little things. That one pop-up box that happens every time you press that button by mistake. The noise the software makes when the compile fails. Or the horrible reality that your code just crashed because there wasn’t enough difference between uppercase ‘O’ and a zero. In comes the programming font.

The typical way to find a programming font is to troll forums for a user with a similar problem and see if they have a workable solution. [Koen Lageveen] went out and found nearly all of the free programming fonts out there and compiled a list. He then took one more step and wrote a web app that lets you test them out. Hopefully this will help those in the very real struggle for the perfect programming font. You can try out the tool for yourself, and if you really like it [Koen] has all the code up for it on his GitHub.

[via Hacker News]