13,000 Regular Expressions Make An Editor’s Life Easier

Being an editor is a job that seems deceptively easy until you are hauled over the coals for letting a textual howler go to print (or website). Most publications have style guides to ensure that their individual voice is preserved, but even the most eagle-eyed will sometimes slip up in their application. At the Guardian newspaper in the UK they have been struggling with this against an ever-evolving style guide that must adapt to fast-moving world events, to the extent that they had a set of regular expressions to deal with commonly-occurring problems. A lot of regular expressions, in fact around 13,000 of them.

Clearly some form of management was required, and  a team of developers set about taming this monster. The result is Typerighter, their server-side document-checker, which can be found in a GitHub repository. Surprisingly for rule management they started with a Google Sheet, a choice which proved unexpectedly robust when working with such a long list even though they later replaced it. The back end doing the job of text matching was written in Scala, and for the front end a plugin was created for their Prosemirror text editor.

For a publication of course this is extremely interesting, but where’s the interest for hackers? The answer lies in any text-processing engine that uses a lot of regular expressions; those of you who have dabbled in this space will know how unwieldy this work can become. Any user of computational linguistic techniques in the pursuit of language processing could probably find much of interest here.

If you’re a bit hazy on regular expressions, how about the episode on them from our long-running Linux-fu series?

This Week In Security: IOS Wifi Incantations, Ghosts, And Bad Regex

I hope everyone had a wonderful Thanksgiving last week. My household celebrated by welcoming a 4th member to the family. My daughter was born on Wednesday morning, November 25th. And thus explains what I did last week instead of writing the normal Hackaday column. Never fear, we shall catch up today, and cover the news that’s fit to be noticed.

iOS Zero-click Wifi Attack

[Ian Beer] of Google’s Project Zero brings us the fruit of his lockdown-induced labors, a spectacular iOS attack. The target of this attack is the kernel code that handles AWDL, an Apple WiFi protocol for adhoc mesh networks between devices. The most notable feature that makes use of AWDL is AirDrop, Apple’s device-to-device file sharing system. Because AWDL is a proprietary protocol, the WiFi hardware can’t do any accelerated processing of packets. A few years back, there was an attack against Broadcom firmware that required a second vulnerability to jump from the WiFi chip to the device CPU. Here, because the protocol is all implemented in Apple’s code, no such pivot is necessary.

And as you’ve likely deduced, there was a vulnerability found. AWDL uses Type-Length-Value (TLV) messages for sending management data. For a security researcher, TLVs are particularly interesting because each data type represents a different code path to attack. One of those data types is a list of MAC addresses, with a maximum of 10. The code that handles it allocates a 60 byte buffer, based on that maximum. The problem is that there isn’t a code path to drop incoming TLVs of that type when they exceed 60 bytes. The remainder is written right past the end of the allocated buffer.

There is more fun to be had, getting to a full exploit, but the details are a bit too much to fully dive in to here. It interesting to note that [Ian] ran into a particular problem: His poking at the target code was triggering unexpected kernel panics. He discovered two separate vulnerabilities, both distinct from the vuln he was trying to exploit.

Finally, this exploit requires the target device to have AWDL enabled, and many won’t. But you can use Bluetooth Low Energy advertisements to trick the target device into believing an Airdrop is coming in from a trusted contact. Once the device enables AWDL to verify the request, the attack can proceed. [Ian] reported his findings to Apple way back in 2019, and this vulnerability was patched in March of 2020.

Via Ars Technica.
Continue reading “This Week In Security: IOS Wifi Incantations, Ghosts, And Bad Regex”

DIY Regular Expressions

In the Star Wars universe, not everyone uses a lightsaber, and those who do wield them had to build them themselves. There’s something to be said about that strategy. Building a car or a radio is a great way to learn how those things work. That’s what [Low Level JavaScript] points out about regular expressions. Sure, a lot of people think they are scary. So why not write your own regular expression parser and engine? Get that under your belt and you’ll probably never fear another regular expression.

Of course, most of us probably won’t do it ourselves, but you can still watch the process in the video below. The code is surprisingly short, but don’t expect all the bells and whistles you might find in Python or even Perl.

Continue reading “DIY Regular Expressions”

Rename Files En Masse In Windows

Bulk Rename Utility

Everybody hates it when they have to rename a fileset to fit a new naming scheme. Instead of doing it the hard way and writing a one-time script to go through and rename everything, check out Bulk Rename Utility from [Jim Willsher]. It provides you with a multitude of methods to take care of business and allows you do pick your favorite method, be it regular expressions, simple finding and replacing, prefix/suffix modification, or a combination of many more.

However, if the sheer amount of options available overwhelms you or if you just want an easier way to do things, check out A.F.5 from [Alex Fauland]. A.F.5 offers features like adding a counter to your filenames, change file attributes, and save your rename settings out to a file for repeat use.