Crunching The News For Fun And Little Profit

Do you ever look at the news, and wonder about the process behind the news cycle? I did, and for the last couple of decades it’s been the subject of one of my projects. The Raspberry Pi on my shelf runs my word trend analysis tool for news content, and since my journey from curious geek to having my own large corpus analysis system has taken twenty years it’s worth a second look.

How Career Turmoil Led To A Two Decade Project

A hanging sign surrounded by ornate metalwork, with the legend "Cyder house".
This is very much a minority spelling. Colin Smith, CC BY-SA 2.0.

In the middle of the 2000s I had come out of the dotcom crash mostly intact, and was working for a small web shop. When they went bust I was casting around as one does, and spent a while as a Google quality rater while I looked for a new permie job. These teams are employed by the search giant through temporary employment agencies, and in loose terms their job is to be the trained monkeys against whom the algorithm is tested. The algorithm chose X, and if the humans also chose X, the algorithm is probably getting it right. Being a quality rater is not in any way a high-profile job, but with the big shiny G on my CV I soon found myself in demand from web companies seeking some white-hat search engine marketing expertise. What I learned mirrored my lesson from a decade earlier in the CD-ROM business, that on the web as in any other electronic publishing medium, good content well presented has priority over any black-hat tricks.

But what makes good content? Forget an obsession with stuffing bogus keywords in the text, and instead talk about the right things, and do it authoritatively. What are the right things in this context? If you are covering a subject, you need to do so using the right language; that which the majority uses rather than language only you use. I can think of a bunch of examples which I probably shouldn’t talk about, but an example close to home for me comes in cider. In the UK, cider is a fermented alcoholic drink made from apples, and as a craft cidermaker of many years standing I have a good grasp of its vocabulary. The accepted spelling is “Cider”, but there’s an alternate spelling of “Cyder” used by some commercial producers of the drink. It doesn’t take long to realise that online, hardly anyone uses cyder with a Y, and thus pages concentrating on that word will do less well than those talking about cider.

A graph of the word football versus the word soccer in British news.
We Brits rarely use the word “soccer” unless there’s a story about the Club World Cup in America.

I started to build software to analyse language around a given topic, with the aim of discerning the metaphorical cider from the cyder. It was a great surprise a few years later to discover that I had invented for myself the already-existing field of computational linguistics, something that would have saved me a lot of time had I known about it when I began. I was taking a corpus of text and computing the frequencies and collocates (words that appear alongside each other) of the words within it, and from that I could quickly see which wording mattered around a subject, and which didn’t. This led seamlessly to an interest in what the same process would look like for news data with a time axis added, so I created a version which harvested its corpus from RSS feeds. Thus began my decades-long project.

Continue reading “Crunching The News For Fun And Little Profit”

Compact Dedicated News Reader Always Brings You CBC

Your phone or laptop will give you access to the vast majority of news in the world, in languages you can read and a few hundred you can’t. Maybe you only like one news source, though, and that news source happens to be Canadian Broadcasting Corporation (CBC). If that’s the case, you might like to give this project a look from [Ron Grimes].

[Ron] built a device that does one thing and one thing only: it displays news stories from CBC. It’s built around a Raspberry Pi 2, and the project began when he wanted to interface a keypad just to see if he could. With that done, the next challenge was to integrate a 16×2 character LCD display of the HD44780 persuasion. With those two tasks completed, the question was simple — what to display? He figured tuning into the CBC news feed would be useful, and the Chocolate Box News Reader was born.

The device displays 29 news feeds in total, including the main top stories, world news, and Canadian regional news. It stores 15 news items per feed and will hang on to those stories even if the Internet drops. The reader will display the whole stash of stored news in around 90 minutes or so, and each stored item comes with more information if something strike’s [Ron’s] curiosity or interest. Files are on GitHub for the curious.

It’s a neat build, and we can imagine it being a smart item to have kicking around the house. It was also a great way for [Ron] to build on his familiarity with the Raspberry Pi, too. Meanwhile, if you’ve got your own nifty Pi-based projects—or others!—don’t hesitate to drop us a line!

The Internet Archive Has Been Hacked

There are a great many organizations out there, all with their own intentions—some selfish, some selfless, some that land somewhere in between. Most would put the Internet Archive in the category of the library—with its aim of preserving and providing knowledge for the aid of all who might call on it. Sadly, as [theresnotime] reports, it appears this grand institution has been hacked.

On Wednesday, users visiting the Internet Archive were greeted with a foreboding popup that stated the following:

Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!

The quote appears to refer to Have I Been Pwned (HIBP), a site that collates details of security breaches so individuals can check if their details have been compromised.

According to founder Brewster Kahle, the site was apparently DDOS’d, with the site defaced via a JavaScript library. It’s believed this may have been a polyfill supply chain attack. As for the meat of the hack, it appears the individuals involved made off with usernames, emails, and encrypted and salted passwords. Meanwhile, as Wired reports, it appears Have I Been Pwned first received the stolen data of 31 million users on September 30.

At the time of writing, it appears the Internet Archive has restored the website to some degree of normal operation. It’s sad to see one of the Internet’s most useful and humble institutions fall victim to a hack like this one. As is always the way, no connected machine is ever truly safe, no matter how much we might hope that’s not the case.

[Thanks to Sammy for the tip!]

Dot-Matrix Printer Brings Old School Feel To Today’s Headlines

If you remember a time when TV news sets universally incorporated a room full of clattering wire service teleprinters to emphasize the seriousness of the news business, congratulations — you’re old. Now, most of us get our news piped directly into our phones, selected by algorithms perfectly tuned to rile us up on whatever the hot-button issue du jour happens to be. Welcome to the future.

If like us you long for a simpler way to get your news, [Andrew Schmelyun] has a partial solution with this dot-matrix news feeder. It’s part of his effort to detox a bit from the whole algorithm thing and make the news a little more concrete. He managed to chase down a very old Star Micronics printer with a serial interface, which he got on the cheap thanks to the previous owner not being sure if it worked. It did, at least after some cleaning, and thanks to a USB-to-serial and the efforts of Linux kernel hackers through the ages, was able to echo output to the printer from a Raspberry Pi Zero W.

From there, getting a daily news feed was as simple as writing some PHP code to mine the APIs of a few selected services. We’re perplexed and alarmed to report that Hackaday is not among the selected sources, but we’re sure this was just a small oversight that will be corrected in version 2. The program runs as a cron job so that a dead-tree version of the day’s top stories is ready for [Andrew]’s morning coffee.

We’ve seen similar news printers before; we particularly like this roll-feed paper version. But for a seriously retro feel, we’d love to see this done on a real teletype.

Shapeways Files For Bankruptcy

One of the earliest hobbyist-friendly on-demand 3D printing and fabrication shops, Shapeways, is filing for bankruptcy. As these financial arrangements always go, this may or may not mean the end of the service, but it’s a sure sign that their business wasn’t running as well as you’d hope.

One of the standout features of Shapeways was always that they made metal printing affordable to the home gamer. Whether it was something frivolous like a custom gear-shifter knob, or something all-too functional like a prototype rocket engine, it was neat to have the alternative workflow of iterative design at home and then shipping out for manufacturing.

We don’t want to speculate too much, but we’d be surprised if the rise of similar services in China wasn’t part of the reason for the bankruptcy. The market landscape just isn’t what it was way back in 2013. (Sadly, the video linked in this article isn’t around any more. If anyone can find a copy, post up in the comments?) So while Shapeways may or may not be gone, it’s not like we can’t get metal parts made anymore.

Still, we’re spilling a little for the OG.

Thanks [Aaron Eiche] for the breaking news tip!

E-Paper News Feed Illustrates The Headlines With AI-Generated Images

It’s hard to read the headlines today without feeling like the world couldn’t possibly get much worse. And then tomorrow rolls around, and a fresh set of headlines puts the lie to that thought. On a macro level, there’s not much that you can do about that, but on a personal level, illustrating your news feed with mostly wrong, AI-generated images might take the edge off things a little.

Let us explain. [Roy van der Veen] liked the idea of an e-paper display newsfeed, but the crushing weight of the headlines was a little too much to bear. To lighten things up, he decided to employ Stable Diffusion to illustrate his feed, displaying both the headline and a generated image on a 7.3″ Inky 7-color e-paper display. Every five hours, a script running on a Raspberry Pi Zero 2W fetches a headline from a random source — we’re pleased the list includes Hackaday — and composes a prompt for Stable Diffusion based on the headline, adding on a randomly selected prefix and suffix to spice things up. For example, a prompt might look like, “Gothic painting of (Driving a Motor with an Audio Amp Chip). Gloomy, dramatic, stunning, dreamy.” You can imagine the results.

We have to say, from the examples [Roy] shows, the idea pretty much works — sometimes the images are so far off the mark that just figuring out how Stable Diffusion came up with them is enough to soften the blow. We’d have preferred if the news of the floods in Libya had been buffered by a slightly less dismal scene, but finding out that what was thought to be a “ritual mass murder” was really only a yoga class was certainly heartening.

Retrotechtacular: Putting Pictures On The Wire In The 1930s

Remember fax machines? They used to be all the rage, and to be honest it was pretty cool to be able to send images back and forth over telephone lines. By the early 2000s, pretty much everyone had some kind of fax capability, whether thanks to a dedicated fax machine, a fax modem, or an all-in-one printer. But then along came the smartphone that allowed you to snap a picture of a document and send it by email or text, and along with the decrease in landline subscriptions, facsimile has pretty much become a technological dead end.

But long before fax machines became commonplace, there was a period during which sending images by wire was a very big deal indeed. So much so that General Motors produced “Spot News,” a short film to demonstrate how newspapers leveraged telephone technology to send photographs from the field. The film is very much of the “March of Progress” genre, and seems to be something that would have been included along with the newsreels and Looney Tunes between the double feature films. It shows a fictional newsroom in The Big City, where a cub reporter gets a hot tip about an airplane stunt about to be attempted out in the sticks. The editor doesn’t want to miss out on a scoop, so he sends a photographer and a reporter to the remote location to cover the stunt, along with a technology-packed photographic field car. Continue reading “Retrotechtacular: Putting Pictures On The Wire In The 1930s”