Linux-Fu: Making AWK A Bit Easier

awk is a kind of Swiss Army knife for text files. However, some of its limitations are often a bit annoying. I’ve used a simple set of functions to make awk a bit better, although I will warn you: it does require GNU extensions to awk. That is, you must use gawk and not other versions. Your system probably maps /usr/bin/awk to something and that something might be gawk. But it could also be mawk or some other flavor. If you use a Debian-based distro, update-alternatives is your friend here. But for the purposes of this post, I’m going to assume you are using gawk.

By the end of the post, you’ll see how to use my awk add-on functions to split up a line into fields even when there is no single character to separate all fields. In addition, you’ll be able to refer to the fields using names you decide. You won’t have to remember that $2 is the time field. You’ll say Fields_fields["time"] instead.

The Problem

awk does a lot of common work for you when you use it to process text files. It reads files a record at a time. Normally, a record is a single line. Then it splits the line on fields using whitespace, or some other choice of field separators. You can write code that manipulates the line or individual fields. This default behavior is great, especially since you can change the end of record character and the field separator. A surprising number of files fit this sort of format.

Until, of course, they don’t. If you have data coming from a data logging instrument or some database, it could be formatted in a variety of ways. Some fields might have structured data with a variety of separators. This isn’t a deal-breaker. Since you can get at the whole line, you can do almost anything you want, but the logic is harder and the whole point to using awk is to make things easier.

For example, suppose you had a file from a data recorder that had an eight-digit serial number, followed by a six-character tag, and then two floating point numbers separated by colons. The pattern might look like

^([0-9]{8})([a-zA-Z0-9]{6})([-+.0-9]+),([-+.0-9]+)$

This would be hard to handle with the conventional field splitting and you’d normally just write code to split everything apart.

Continue reading “Linux-Fu: Making AWK A Bit Easier”

Linux Fu: Literate Regular Expressions

Regular expressions — the things you feed to programs like grep — are a bit like riding a bike. It seems impossible until you learn to do it, and then it’s easy. Part of their bad reputation is because they use a very concise and abbreviated syntax that alarms people. To help people who don’t use regular expressions every day, I created a tool that lets you write them in something a little closer to plain English. Actually, I’ve written several versions of this over the years, but this incarnation that targets grep is the latest. Unlike some previous versions, this time I did it all using Bash.

Those who don’t know regular expressions might freak out when they see something like:

[0-9]{5}(-[0-9]{4})?

How long does it take to figure out what that does? What if you could write that in a more literate way? For example:

digit repeat 5 \

start_group \

   - digit repeat 4 \

end_group optional

Not as fast to type, sure. But you can probably deduce what it does: it reads US Zipcodes.

I’ve found that some of the most popular tools I’ve created over the years are ones that I don’t need myself. I’m sure you’ve had that experience, too. You know how to operate a computer, but you create a menu system for people who don’t and they love it. That’s how it is with this tool. You might not need it, but there’s a good chance you know someone who does. Along the way, the code uses some interesting features of Bash, so even if you don’t want to be verbose with your regular expressions, you might pick up a trick or two.

Continue reading “Linux Fu: Literate Regular Expressions”

Program Guesses Your Regular Expression

We aren’t sure how we feel about [pemistahl’s] grex program. On the one hand, we applaud a program that can take some input samples and produce a regular expression. On the other hand, it might be just as hard to gather example data that produces the correct regular expression. Still, it is an interesting piece of code.

Even the author suggests not to use this as an excuse to not learn regular expressions, since you’ll need to check the program’s output. It is certain that the results will match your test cases, but it isn’t certain that it won’t accept things you didn’t expect. Bad regular expressions have been the source of some deeply buried bugs.

Continue reading “Program Guesses Your Regular Expression”

Linux Fu: Regular Expressions

If you consider yourself a good cook, you may or may not know how to make a souffle or baklava. But there are certain things you probably do know how to do that form the basis of many recipes. For example, you can probably boil water, crack an egg, and brown meat. With Linux or Unix systems, you can make the same observation. You might not know how to set up a Wayland server or write a kernel module. But there are certain core skills like file manipulation and editing that will serve you no matter what you do. One of the pervasive skills that often gives people trouble is regular expressions. Many programs use these as a way to specify search patterns, usually in text strings such as files.

If you aren’t comfortable with regular expressions, that’s easy to fix. They aren’t that hard to learn and there are some great tools to help you. Many tools use regular expressions and the core syntax is the same. The source of confusion is that the details beyond core syntax have variations.

Let’s look at the foundation you need to understand regular expression well.

Continue reading “Linux Fu: Regular Expressions”

Crosswords Help You Learn Regular Expressions

Regular expressions might seem arcane, but if you do any kind of software, they are a powerful hacker tool. Obviously, if you are writing software or using tools like grep, awk, sed, Perl, or just about any programming language, regular expressions can simplify many tasks. Even if you don’t need them directly, regular expression searches can help you analyze source code, search through net lists, or even analyze data captured from sensors.

If you’ve been using regular expressions for a long time, they aren’t very hard. But learning them for the first time can be tedious. Unless you try your hand at regular expression crosswords. The clues are regular expressions and the rows and columns all have to match the corresponding regular expressions.

Continue reading “Crosswords Help You Learn Regular Expressions”

Rename Files En Masse In Windows

Bulk Rename Utility

Everybody hates it when they have to rename a fileset to fit a new naming scheme. Instead of doing it the hard way and writing a one-time script to go through and rename everything, check out Bulk Rename Utility from [Jim Willsher]. It provides you with a multitude of methods to take care of business and allows you do pick your favorite method, be it regular expressions, simple finding and replacing, prefix/suffix modification, or a combination of many more.

However, if the sheer amount of options available overwhelms you or if you just want an easier way to do things, check out A.F.5 from [Alex Fauland]. A.F.5 offers features like adding a counter to your filenames, change file attributes, and save your rename settings out to a file for repeat use.