Estimate Your English Vocabulary Using Python

We take our mother tongue for granted, a language we learn as young children without realizing the effort involved. It is only when as adults we try to pick up another language that we fully understand how much hard work surrounds each acquired word.

Depending on who you listen to, estimates vary as to the size of a typical native English speaker’s vocabulary. The ballpark figures seem to put most adults under 20 thousand words, while graduates achieve somewhere around 23 thousand words. It’s a subject [Alex Eames] became interested in after reading a BBC article on it, and he decided to write his own software to produce a personal estimate.

His Python script takes the Scrabble word list, and presents the user with a list of words, for each one of which they have to indicate their comprehension. After a hundred words have been presented it calculates an estimate of the size of the user’s vocabulary. [Alex] wrote it on and for the Raspberry Pi, but it should work quite happily on any platform with Python 3. It certainly had no problem with our Ubuntu-based PC.

There is plenty of opportunity for bragging over the size of one’s vocabulary with a script like this one, but it’s something of a statistical leveler in that if you are truthful in your responses it will almost certainly put you exactly where you might expect for your age or level of education. If you want to know the result this script returned for a Hackaday scribe, for example, the answer is 23554.

This subject is a slight departure into software from our usual hardware subject matter, but it’s one of those tests that becomes rather a consuming interest when performed competitively among a group of friends. How well will you fare?

Via [Recantha]

Scanning Parts Into KiCad

You do not know how to make a PCB unless you can make your own parts. [Jan] knows this, but like everyone else he checked out the usual online sources for a footprint for an SD card socket before making his own. It turns out, this SD card socket bought from an online marketplace was completely undocumented. Not only was an Eagle or KiCad footprint unavailable, but CAD files showing the dimensions of the part were non-existent. A solution had to be devised.

Instead of taking calipers and finely measuring all the pads on this SD card socket – a process that would surely fail – [Jan] decided to use a flatbed scanner to trace out the part. The part was placed on the glass and scanned at 300 dpi with a convenient reference object (a public transport card) in the same picture. This picture was imported into a CAD package, scaled to the correct ratio, and exported as a DXF. Since KiCad readily accepts importing DXFs, the CAD file was easily accessed, traced over, and a new part created.

From start to finish, making the footprint for this no-name, off-brand SD card socket took fifteen minutes. That’s nothing compared to the time it would take to manually measure each of the pads, draw a footprint, and print out the footprint at 1:1 scale to see if it matched up several times. It’s awesome work, and a great reminder that the best tools are usually right in front of you.

Hallucinating Machines Generate Tiny Video Clips

Hallucination is the erroneous perception of something that’s actually absent – or in other words: A possible interpretation of training data. Researchers from the MIT and the UMBC have developed and trained a generative-machine learning model that learns to generate tiny videos at random. The hallucination-like, 64×64 pixels small clips are somewhat plausible, but also a bit spooky.

The machine-learning model behind these artificial clips is capable of learning from unlabeled “in-the-wild” training videos and relies mostly on the temporal coherence of subsequent frames as well as the presence of a static background. It learns to disentangle foreground objects from the background and extracts the overall dynamics from the scenes. The trained model can then be used to generate new clips at random (as shown above), or from a static input image (as shown in pairs below).

Currently, the team limits the clips to a resolution of 64×64 pixels and 32 frames in duration in order to decrease the amount of required training data, which is still at 7 TB. Despite obvious deficiencies in terms of photorealism, the little clips have been judged “more realistic” than real clips by about 20 percent of the participants in a psychophysical study the team conducted. The code for the project (Torch7/LuaJIT) can already be found on GitHub, together with a pre-trained model. The project will also be shown in December at the 2016 NIPS conference.

Grand Theft Auto V Used To Teach Self-Driving AI

For all the complexity involved in driving, it becomes second nature to respond to pedestrians, environmental conditions, even the basic rules of the road. When it comes to AI, teaching machine learning algorithms how to drive in a virtual world makes sense when the real one is packed full of squishy humans and other potential catastrophes. So, why not use the wildly successful virtual world of Grand Theft Auto V to teach machine learning programs to operate a vehicle?

Half and Half GTAV Annotation ThumbThe hard problem with this approach is getting a large enough sample for the machine learning to be viable. The idea is this: the virtual world provides a far more efficient solution to supplying enough data to these programs compared to the time-consuming task of annotating object data from real-world images. In addition to scaling up the amount of data, researchers can manipulate weather, traffic, pedestrians and more to create complex conditions with which to train AI.

It’s pretty easy to teach the “rules of the road” — we do with 16-year-olds all the time. But those earliest drivers have already spent a lifetime observing the real world and watching parents drive. The virtual world inside GTA V is fantastically realistic. Humans are great pattern recognizers and fickle gamers would cry foul at anything that doesn’t analog real life. What we’re left with is a near-perfect source of test cases for machine learning to be applied to the hard part of self-drive: understanding the vastly variable world every vehicle encounters.

A team of researchers from Intel Labs and Darmstadt University in Germany created a program that automatically indexes the virtual world (as seen above), creating useful data for a machine learning program to consume. This isn’t a complete substitute for real-world experience mind you, but the freedom to make a few mistakes before putting an AI behind the wheel of a vehicle has the potential to speed up development of autonomous vehicles. Read the paper the team published Playing for Data: Ground Truth from Video Games.

Continue reading “Grand Theft Auto V Used To Teach Self-Driving AI”

Commanding Kerbals With A Physical Interface

Kerbal Space Program will have you hurling little green men into the wastes of outer space, landing expended boosters back on the launchpad, and using resources on the fourth planet from the Sun to bring a crew back home. Kerbal is the greatest space simulator ever created, teaches orbital mechanics better than the Air Force textbook, but it is missing one thing: switches and blinky LEDs.

[SgtNoodle] felt this severe oversight by the creators of Kerbal could be remedied by building his Kerbal Control Panel, which adds physical buttons, switches, and a real 6-axis joystick for roleplaying as an Apollo astronaut.

The star of this build is the custom six-axis joystick, used for translation control when docking, maneuvering, or simply puttering around in space. Four axis joysticks are easy, but to move forward and backward, [SgtNoodle] replaced the shaft of a normal arcade joystick with a carriage bolt, added a washer on one end, and used two limit switches to give this MDF cockpit Z+ and Z- control.

The rest of the build is equally well detailed, with a CNC’d front panel, toggle switches and missile switch covers, with everything connected to an Arduino Mega. This Arduino interfaces the switches to the game with the kRPC mod, which creates a script-driven interface to the game. So, toggling the landing gear switch, for instance, triggers a script which interfaces with KSP to lower your landing gear prior to a nice, safe landing. Or, more likely, a terrifying crash.

Lightweight Game Console Packs a Punch

Any maker worth their bits will look for new ways to challenge themselves. [Robert Fotino], a computer science student at the University of California, is doing just that: designing and building his own lightweight hobbyist game console that he has appropriately named Consolite.

[Fotino] wrote his own compiler in C++ that converts from C-like languages to a custom-designed assembler that he has dubbed Consolite Assembly. To test his code, he also wrote an emulator before loading it onto the Mimas V2 FPGA board. Presently, Consolite  uses 64KiB of main memory and 48 KiB of video memory; a future version will have 32 bit support to make better use of the Mimas’ 64 MiB of on board ram, but the current 16-bit version is a functional proof of concept.

consolite-status-leds-and-hardware-switches_thumbnailAn SD card functions as persistent storage for up to 256 programs, which can be accessed using the hardware switches on the Mimas, with plans to add user access in the form of saving game progress, storage outside of main memory, etc. — also in a future update that will include audio support.

As it stands, [Fotino] has written his own versions of Breakout, Tetris, and Tron to show off his project.

Not wanting for diligence, [Fotino] has provided thorough documentation of nearly every step along the way in his blog posts and on GitHub if you are looking for guidelines for any similar projects you might have on the back burner — like an even tinier game console.

[via r/FPGA]

Gawking Hex Files

Last time I talked about how to use AWK (or, more probably the GNU AWK known as GAWK) to process text files. You might be thinking: why did I care? Hardware hackers don’t need text files, right? Maybe they do. I want to talk about a few common cases where AWK can process things that are more up the hardware hacker’s alley.

The Simple: Data Logs

If you look around, a lot of data loggers and test instruments do produce text files. If you have a text file from your scope or a program like SIGROK, it is simple to slice and dice it with AWK. Your machines might not always put out nicely formatted text files. That’s what AWK is for.

AWK makes the default assumption that fields break on whitespace and end with line feeds. However, you can change those assumptions in lots of ways. You can set FS and RS to change the field separator and record separator, respectively. Usually, you’ll set this in the BEGIN action although you can also change it on the command line.

For example, suppose your test file uses semicolons between fields. No problem. Just set FS to “;” and you are ready to go. Setting FS to a newline will treat the entire line as a single field. Instead of delimited fields, you might also run into fixed-width fields. For a file like that, you can set FIELDWIDTHS.

If the records aren’t delimited, but a fixed length, things are a bit trickier. Some people use the Linux utility dd to break the file apart into lines by the number of bytes in each record. You can also set RS to a limited number of any character and then use the RT variable (see below) to find out what those characters were. There are other options and even ways to read multiple lines. The GAWK manual is your friend if you have these cases.

BEGIN { RS=".{10}"   # records are 10 characters


   print $0  # do what you want here

Once you have records and fields sorted, it is easy to do things like average values, detect values that are out of limit, and just about anything else you can think of.

Spreadsheet Data Logs

Some tools output spreadsheets. AWK isn’t great at handling spreadsheets directly. However, a spreadsheet can be saved as a CSV file and then AWK can chew those up easily. It is also an easy format to produce from an AWK file that you can then read into a spreadsheet. You can then easily produce nice graphs, if you don’t want to use GNUPlot.

Simplistically, setting FS to a comma will do the job. If all you have is numbers, this is probably enough. If you have strings, though, some programs put quotes around strings (that may contain commas or spaces). Some only put quotes around strings that have commas in them.

To work around this problem cleanly, AWK offers an alternate way to define fields. Normally, FS tells you what characters separate a field. However, you can set FPAT to define what a field looks like. In the case of CSV file, a field is any character other than a comma or a double quote and then anything up to the next double quote.

The manual has a good example:

  FPAT = "([^,]+)|(\"[^\"]+\")"

  print "NF = ", NF
  for (i = 1; i <= NF; i++) {
  printf("$%d = <%s>\n", i, $i)

This isn’t perfect. For example, escaped quotes don’t work right. Quoted text with new lines in it don’t either. The manual has some changes that remove quotes and handle empty fields, but the example above works for most common cases. Often the easiest approach is to change the delimiter in the source program to something unique, instead of a comma.

Hex Files

Another text file common in hardware circles is a hex file. That is a text file that represents the hex contents of a programmable memory (perhaps embedded in a microcontroller). There are two common formats: Intel hex files and Motorola S records. AWK can handle both, but we’ll focus on the Intel variant.

Old versions of AWK didn’t work well with hex input, so you’d have to resort to building arrays to convert hex digits to numbers. You still see that sometimes in old code or code that strives to be compatible. However, GNU AWK has the strtonum function that explicitly converts a string to a number and understands the 0x prefix. So a highly compatible two digit hex function looks like this (not including the code to initialize the hexdigit array):

function hex2dec(x) {
  return (hexdigit[substr(x,1,1)]*16)+hexdigit[substr(x,2,1)]

If you don’t mind requiring GAWK, it can look like this:

function hex2dec(x) {
  return strtonum("0x" x);

In fact, the last function is a little better (and misnamed) because it can handle any hex number regardless of length (up to whatever limit is in GAWK).

Hex output is simple since you have printf and the X format specifier is available. Below is an AWK script that chews through a hex file and provides a count of the entire file, plus shows a breakdown of the segments (that is, non-contiguous memory regions).

BEGIN { ct=0;

function hex4dec(y) {
  return strtonum("0x" y)

function hex2dec(x) {
  return strtonum("0x" x);

/:[[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]][[:xdigit:]]00/ {

  ad = hex4dec(substr($0, 4, 4))
  if (ad != adxpt) {
  block[++n] = ad
  adxpt = ad;
  l = hex2dec(substr($0, 2, 2))
  blockct[n] = blockct[n] + l
  adxpt = adxpt + l
  ct = ct + l

END { printf("Count=%d (0x%04x) bytes\t%d (0x%04x) words\n\n", ct, ct, ct/2, ct/2)
  for (i = 1 ; i <= n ; i++) {
  printf("%04x: %d (0x%x) bytes\t", block[i], blockct[i], blockct[i])
  printf("%d (0x%x) words\n", blockct[i]/2, blockct[i]/2)


This shows a few AWK features: the BEGIN action, user-defined functions, the use of named character classes (:xdigit: is a hex digit) and arrays (block and blockct use numeric indices even though they don’t have to). In the END action, the summary uses printf statements for both decimal and hex output.

Once you can parse a file like this, there are many things you could do with the resulting data. Here’s an example of some similar code that does a sanity check on hex files.

Binary Files

Text files are fine, but real hardware uses binary files that people (and AWK) can’t easily read, right? Well, maybe people, but AWK can read binary files in a few ways. You can use getline in the BEGIN part of the script and control how things are read directly. You can also use the RS/RT trick mentioned above to read a specific number of bytes. There are a few other AWK-only methods you can read about if you are interested.

However, the easiest way to deal with binary files in AWK is to convert them to text files using something like the od utility. This is a program available with Linux (or Cygwin, and probably other Windows toolkits) that converts a binary file to different readable formats. You probably want hex bytes, so that’s the -t x2 option (or use x4 for 16-bit words). However, the output is made for humans, not machines, so when a long run of the same output occurs, od omits them replacing all the missing lines with a single asterisk. For AWK use, you want to use the -v option to turn that behavior off. There are other options to change the output radix of the address, swap bytes, and more.

Here are a few lines from a random binary file:

0000000 d8ff e0ff 1000 464a 4649 0100 0001 0100
0000020 0100 0000 dbff 4300 5900 433d 434e 5938
0000040 484e 644e 595e 8569 90de 7a85 857a c2ff
0000060 a1cd ffde ffff ffff ffff ffff ffff ffff
0000100 ffff ffff ffff ffff ffff ffff ffff ffff
0000120 ffff ffff ffff ffff ffff 00db 0143 645e
0000140 8564 8575 90ff ff90 ffff ffff ffff ffff
0000160 ffff ffff ffff ffff ffff ffff ffff ffff
0000200 ffff ffff ffff ffff ffff ffff ffff ffff
0000220 ffff ffff ffff ffff ffff ffff ffff c0ff
0000240 1100 0108 02e0 0380 2201 0200 0111 1103
0000260 ff01 00c4 001f 0100 0105 0101 0101 0001
0000300 0000 0000 0000 0100 0302 0504 0706 0908
0000320 0b0a c4ff b500 0010 0102 0303 0402 0503
0000340 0405 0004 0100 017d 0302 0400 0511 2112
0000360 4131 1306 6151 2207 1471 8132 a191 2308

This is dead simple to parse with AWK. The address will be $1 and each field will be $2, $3, etc. You can just convert the file yourself, use a pipe in the shell, or–if you want a clean solution–have AWK run od as a subprocess. Since the input is text, all of AWK’s regular expression features still work, which is useful.

Writing binary files is easy, too, since printf can output nearly anything. An alternative is to use xxd instead of od. It can convert binary files to text, but also can do the reverse.

Full Languages

There’s an old saying that if all you have is a hammer, everything looks like a nail. I doubt that AWK is the best tool to build full languages, but it can be a component of some quick and dirty hacks. For example, the universal cross assembler uses AWK to transform assembly language files into an internal format the C preprocessor can handle

Since AWK can call out to external programs easily, it would be possible to write things that, for example, processed a text file of commands and used them to drive a robot arm. The regular expression matching makes text processing easy and external programs could actually handle the hardware interface.

awk-wolfensteinThink that’s far fetched? We’ve covered stranger AWK use cases, including a Wolfenstien-like game that uses 600 lines of AWK script (as seen to the right).

So, sure it is software, but it is a tool that has that Swiss Army knife quality that makes it a useful tool for software and hardware hackers alike. Of course, other tools like Perl, Python, and even C or C++ can do more. But often with a price in complexity and learning curve. AWK isn’t for every job, but when it works, it works well.