Grep By Example is also available as a PDF Minibook, and a Grep playground helps you learn quickly.

Galvanize Your Grip On Grep With This Great Grep Guide

These days, you can’t throw a USB stick without hitting something that’s running Linux. It might be a phone, an embedded device, or your TV. Either way, it’s running Linux, and somewhere along the line of the development of whatever your USB stick smacked into, somebody used the Global Regular Expression Print utility- better known as Grep. But what is Grep, and why do you need it? [Anton Zhiyanov] not only answers those questions but provides Grep by example: Interactive Guide to help you along.

Grep By Example is also available as a PDF Minibook, and a Grep playground helps you learn quickly.
Grep By Example is also available as a PDF Minibook, and a Grep playground helps you learn quickly.

To understand Linux, one must understand its commercial predecessor, Unix. One of the things that made Unix (and then Linux) unique was its philosophy: Write programs that work together, do one thing well, and handle text streams.  This philosophy describes a huge number of programs, and one of these programs is Grep. It’s installed everywhere there’s a *nix installed, and once one becomes familiar with it, their command-line-fu reaches an all new level.

At its core, Grep is simply a bloodhound. It’s scent? A magical incantation called Regular Expressions. Regular Expressions (aka Regex) are simply a way of describing what a stream of text should look like. So when you feed Grep a bit of Regular Expression, it Prints only the text that matches that expression. Neat, right?

The trouble is that Regex can be kind of hard, and Grep has various versions and capabilities that need to be learned. And this is where the article shines- it covers both in an excellent interactive tutorial that’ll help you become a Grep Guru in no time. And if you want to do a deeper dive, check out what it takes to make your own Regex Engine from scratch!

Obfuscated C 8080 Emulator Ported

[Oscar] is no stranger to writing hard-to-read C code. While most of us do that by accident, there are those who strive to write the most unreadable code and enter it in the IOCCC — the International Obfuscated C Code Contest. One of his winning entries was a single C function that emulates an 8080. With a few support files, the plucky little emulator will run CP/M.

The emulator won best in show, but that was in 2006. Things have changed a bit and [Oscar] has updated the code so that you can continue to try it if you want to give yourself a headache reading code. The portability isn’t a CPU issue — modern CPUs will happily run code from 2006. The problem is the compiler and operating system. Compilers are much stricter these days, and Linux needs a little extra coaxing to give access to the input stream the way the faux computer needs it.

Continue reading “Obfuscated C 8080 Emulator Ported”

diagram of the radicle node-to-node connectivity

Radicle: An Open-Source, Peer-to-Peer, GitHub Alternative

The actions of certain large social networks have recently highlighted how a small number of people possess significant power over the masses and how this power is sometimes misused. Consequently, there has been a surge in the development of federated (or decentralized) services, such as Mastodon and Matrix.  But what about development? While GitHub and similar services are less likely to be used for political manipulation, they are still centralized services with a common failure point. Radicle is an open-source, peer-to-peer collaboration stack built on top of Git but backed with public key cryptography as a standard and a gossip protocol to ensure widespread data sharing across the network and, thus, some fault tolerance.

Essentially, code and associated documentation are secured cryptographically with an identity. The Git protocol is used for actual data transfer from peer-to-peer, which means that updates are only sent as deltas, not complete copies, maximizing channel bandwidth efficiency. A custom gossip protocol is used for metadata transfer around the network of peers. The projects had a local-first ideology, with users running a full-stack node on their hardware and all features available, even offline, which is great for laptop users who move around locations with sporadic access to the internet.

Judging from their Zulipchat instance, this is a highly active space, so perhaps it is worth diving in and seeing if it floats your boat. Fancy getting onto the Fediverse, but only have a spare MS-DOS machine to try it on? We’ve got it covered. Want to use Git but not online? You need a private Git server. Finally, too much Git? How about Gitless?

Thanks [Anonymous] for the tip! No, that wasn’t lost on us :D

Making Floating Point Calculations Less Cursed When Accuracy Matters

Inverting the earlier exponentiation to reduce floating point arithmetic error. (Credit: exozy)
Inverting the earlier exponentiation to reduce floating point arithmetic error. (Credit: exozy)

An unfortunate reality of trying to represent continuous real numbers in a fixed space (e.g. with a limited number of bits) is that this comes with an inevitable loss of both precision and accuracy. Although floating point arithmetic standards – like the commonly used IEEE 754 – seek to minimize this error, it’s inevitable that across the range of a floating point variable loss of precision occurs. This is what [exozy] demonstrates, by showing just how big the error can get when performing a simple division of the exponential of an input value by the original value. This results in an amazing error of over 10%, which leads to the question of how to best fix this.

Obviously, if you have the option, you can simply increase the precision of the floating point variable, from 32-bit to 64- or even 256-bit, but this only gets you so far. The solution which [exozy] shows here involves using redundant computation by inverting the result of ex. In a demonstration using Python code (which uses IEEE 754 double precision internally), this almost eradicates the error. Other than proving that floating point arithmetic is cursed, this also raises the question of why this works.

Continue reading “Making Floating Point Calculations Less Cursed When Accuracy Matters”

The White House Memory Safety Appeal Is A Security Red Herring

In the Holy Programming Language Wars, the lingua franca of system programming – also known as C – is often lambasted for being unsecure, error-prone, and plagued with more types of behavior that are undefined than ones that are defined by the C standards. Many programming languages were said to be ‘C killers’, yet C is still alive today. That didn’t stop the US White House’s Office of the National Cyber Director (ONCD) from putting out a report in which both C and C++ got lambasted for being ‘unsafe’ when it came to memory management.

The full report (PDF) is pretty light on technical details, while citing only blog posts by Microsoft and Google as its ‘expert sources’. The claim that memory safety issues are the primary cause of CVEs is not substantiated, or at least ignores the severity of CVEs when looking at the CISA statistics for active exploits. Beyond this call for ‘memory safety’, the report then goes on to effectively call for more testing and validation, while kicking in doors that were opened back in the 1970s already with the Steelman requirements and the High Order Language Working Group (HOLWG) of 1975.

What truly is the impact and factual basis of the ONCD report?

Continue reading “The White House Memory Safety Appeal Is A Security Red Herring”

The ELIZA Archaeology Project: Uncovering The Original ELIZA

Since ELIZA was created by [Joseph Weizenbaum] in the 1960s, its success had led to many variations and ports being written over the intervening decades. The goal of the ELIZA Archaeology Project by Stanford, USC, Oxford and other university teams is to explore and uncover as much of this history as possible, starting with the original 1960s code. As noted in a recent blog post by [Anthony Hay], most of the intervening ‘ELIZA’ versions seem to have been more inspired by the original rather than accurate replicas or extensions of the original. This raises the question of what the original program really looked like, a question which wasn’t answered until 2020 when the original source code was rediscovered. Continue reading “The ELIZA Archaeology Project: Uncovering The Original ELIZA”

Filters Are In Bloom

If you are a fan of set theory, you might agree there are two sets of people who write computer programs: those who know what a Bloom filter is and those who don’t. How could you efficiently test to see if someone is one set or another? Well, you could use a Bloom filter.  [SamWho] takes us through the whole thing in general terms that you could apply in any situation.

The Bloom filter does perform a trade-off for its speed. It is subject to false positives but not false negatives. That is, if a Bloom filter algorithm tells you that X is not part of a set, it is correct. But if it tells you it is, you may have to investigate more to see if that’s true.

If it can’t tell you that something is definitely in a set, why bother? Usually, when you use a Bloom filter, you want to reduce searching through a huge amount of data. The example in the post talks about having a 20-megabyte database of “bad” URLs. You want to warn users if they enter one, but downloading that database is prohibitive. But a Bloom filter could be as small as 1.8 megabytes. However, there would be a 1 in 1000 chance of a false positive.

Increase the database size to 3.59 megabytes, and you can reduce false positives to one in a million. Presumably, if you got a positive, you could accept the risk it is false, or you could do more work to search further.

Imagine, for example, a web cache device or program. Many web pages are loaded one time and never again. If you cache all of them, you’ll waste a lot of time and push other things out of the cache. But if you test a page URL with a Bloom filter, you can improve things quite a bit. If the URL may exist in the Bloom filter, then you’ve probably seen it before, so you might want to cache it.

If it says you haven’t, you can add it to the filter so if it is ever accessed again, it will cache. Sure, sometimes a page will show a false positive. So what? You’ll just cache the page on the first time, which is what you did before, anyway. If that happens only 0.1% of the time, you still win.

In simple terms, the Bloom filter hashes each item using three different algorithms and sets bits in an array based on the result. To test an item, you compute the same hashes and see if any of the corresponding bits are set to zero. If so, the item can’t be in the set. Of course, there’s no assurance that all three bits being set means the set contains the item. Those three bits might be set for totally different items.

Why does increasing the number of bits help? The post answers that and looks at other optimizations like a different number of hash functions and counting.

The post does a great job of explaining the filter but if you want a more concrete example in C, you might want to read this post next. Or search for code in your favorite language. We’ve talked about Python string handling with Bloom filters before. We’ve even seen a proposal to add them to the transit bus.