Crunching The News For Fun And Little Profit

Do you ever look at the news, and wonder about the process behind the news cycle? I did, and for the last couple of decades it’s been the subject of one of my projects. The Raspberry Pi on my shelf runs my word trend analysis tool for news content, and since my journey from curious geek to having my own large corpus analysis system has taken twenty years it’s worth a second look.

How Career Turmoil Led To A Two Decade Project

A hanging sign surrounded by ornate metalwork, with the legend "Cyder house".
This is very much a minority spelling. Colin Smith, CC BY-SA 2.0.

In the middle of the 2000s I had come out of the dotcom crash mostly intact, and was working for a small web shop. When they went bust I was casting around as one does, and spent a while as a Google quality rater while I looked for a new permie job. These teams are employed by the search giant through temporary employment agencies, and in loose terms their job is to be the trained monkeys against whom the algorithm is tested. The algorithm chose X, and if the humans also chose X, the algorithm is probably getting it right. Being a quality rater is not in any way a high-profile job, but with the big shiny G on my CV I soon found myself in demand from web companies seeking some white-hat search engine marketing expertise. What I learned mirrored my lesson from a decade earlier in the CD-ROM business, that on the web as in any other electronic publishing medium, good content well presented has priority over any black-hat tricks.

But what makes good content? Forget an obsession with stuffing bogus keywords in the text, and instead talk about the right things, and do it authoritatively. What are the right things in this context? If you are covering a subject, you need to do so using the right language; that which the majority uses rather than language only you use. I can think of a bunch of examples which I probably shouldn’t talk about, but an example close to home for me comes in cider. In the UK, cider is a fermented alcoholic drink made from apples, and as a craft cidermaker of many years standing I have a good grasp of its vocabulary. The accepted spelling is “Cider”, but there’s an alternate spelling of “Cyder” used by some commercial producers of the drink. It doesn’t take long to realise that online, hardly anyone uses cyder with a Y, and thus pages concentrating on that word will do less well than those talking about cider.

A graph of the word football versus the word soccer in British news.
We Brits rarely use the word “soccer” unless there’s a story about the Club World Cup in America.

I started to build software to analyse language around a given topic, with the aim of discerning the metaphorical cider from the cyder. It was a great surprise a few years later to discover that I had invented for myself the already-existing field of computational linguistics, something that would have saved me a lot of time had I known about it when I began. I was taking a corpus of text and computing the frequencies and collocates (words that appear alongside each other) of the words within it, and from that I could quickly see which wording mattered around a subject, and which didn’t. This led seamlessly to an interest in what the same process would look like for news data with a time axis added, so I created a version which harvested its corpus from RSS feeds. Thus began my decades-long project.

Continue reading “Crunching The News For Fun And Little Profit”

Hackaday Links Column Banner

Hackaday Links: July 6, 2025

Taking delivery of a new vehicle from a dealership is an emotional mixed bag. On the one hand, you’ve had to endure the sales rep’s hunger to close the deal, the tedious negotiations with the classic “Let me run that by my manager,” and the closer who tries to tack on ridiculous extras like paint sealer and ashtray protection. On the other hand, you’re finally at the end of the process, and now you get to play with the Shiny New Thing in your life while pretending it hasn’t caused your financial ruin. Wouldn’t it be nice to skip all those steps in the run-up and just cut right to the delivery? That’s been Tesla’s pitch for a while now, and they finally made good on the promise with their first self-driving delivery.
Continue reading “Hackaday Links: July 6, 2025”

Last Chance: 2025 Hackaday Supercon Still Wants You!

Good news, procrastinators! Today was going to be the last day to throw your hat in the ring for a slot to talk at Supercon in November, but we’re extending the deadline one more week, until July 10th. We have an almost full schedule, but we’re still missing your talk.

So if the thought of having missed the deadline fills you with regret, here’s your second chance. We have spots for both 40-minute and 20-minute talks still open. We love to have a mix of newcomers as well as longtime Hackaday friends, so don’t be shy.

Supercon is a super fun time, and the crowd is full of energy and excitement for projects of all kinds. There is no better audience to present your feats of hardware derring-do, stories of reverse engineering, or other plans for world domination. Where else will you find such a density of like-minded hackers?

Don’t delay, get your talk proposal in today.

Back To The Future, 40 Years Old, Looks Like The Past

Great Scott! If my calculations are correct, when this baby hits 88 miles per hour, you’re gonna see some serious shit. — Doc Brown

On this day, forty years ago, July 3rd, 1985 the movie Back to the Future was released. While not as fundamental as Hackers or realistic as Sneakers, this movie worked its way into our pantheon. We thought it would be appropriate to commemorate this element of hacker culture on this day, its forty year anniversary.

If you just never got around to watching it, or if it has been a few decades since you did, then you might not recall that the movie is set in two periods. It opens in 1985 and then goes back to 1955. Most of the movie is set in 1955 with Marty trying to get back to 1985 — “back to the future”. The movie celebrates the advanced technology and fashions of 1985 and is all about how silly the technology and fashions of 1955 are as compared with the advancements of 1985. But now it’s the far future, the year 2025, and we thought we might take a look at some of the technology that was enchanting in 1985 but that turned out to be obsolete in “the future”, forty years on. Continue reading “Back To The Future, 40 Years Old, Looks Like The Past”

South Korea Brought High-Rise Fire Escape Solutions To The Masses

When a fire breaks out in a high-rise building, conventional wisdom is that stairwells are the only way out. Lifts are verboten in such scenarios, while sheer height typically prevents any other viable route of egress from tall modern buildings. If the stairs are impassable, or you can’t reach them, you’re in dire peril.

In South Korea, though, there’s another option for escape. The answer involves strapping on a harness and descending down ropes hanging off the side of the building, just like in an action movie. It might sound terrifying, but these descending lifeline devices have become a common part of fire safety infrastructure across the country.

Continue reading “South Korea Brought High-Rise Fire Escape Solutions To The Masses”

Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore

Some time ago, Linus Torvalds made a throwaway comment that sent ripples through the Linux world. Was it perhaps time to abandon support for the now-ancient Intel 486? Developers had already abandoned the 386 in 2012, and Torvalds openly mused if the time was right to make further cuts for the benefit of modernity.

It would take three long years, but that eventuality finally came to pass. As of version 6.15, the Linux kernel will no longer support chips running the 80486 architecture, along with a gaggle of early “586” chips as well. It’s all down to some housekeeping and precise technical changes that will make the new code inoperable with the machines of the past.

Continue reading “Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore”