Crunching The News For Fun And Little Profit

Do you ever look at the news, and wonder about the process behind the news cycle? I did, and for the last couple of decades it’s been the subject of one of my projects. The Raspberry Pi on my shelf runs my word trend analysis tool for news content, and since my journey from curious geek to having my own large corpus analysis system has taken twenty years it’s worth a second look.

How Career Turmoil Led To A Two Decade Project

A hanging sign surrounded by ornate metalwork, with the legend "Cyder house".
This is very much a minority spelling. Colin Smith, CC BY-SA 2.0.

In the middle of the 2000s I had come out of the dotcom crash mostly intact, and was working for a small web shop. When they went bust I was casting around as one does, and spent a while as a Google quality rater while I looked for a new permie job. These teams are employed by the search giant through temporary employment agencies, and in loose terms their job is to be the trained monkeys against whom the algorithm is tested. The algorithm chose X, and if the humans also chose X, the algorithm is probably getting it right. Being a quality rater is not in any way a high-profile job, but with the big shiny G on my CV I soon found myself in demand from web companies seeking some white-hat search engine marketing expertise. What I learned mirrored my lesson from a decade earlier in the CD-ROM business, that on the web as in any other electronic publishing medium, good content well presented has priority over any black-hat tricks.

But what makes good content? Forget an obsession with stuffing bogus keywords in the text, and instead talk about the right things, and do it authoritatively. What are the right things in this context? If you are covering a subject, you need to do so using the right language; that which the majority uses rather than language only you use. I can think of a bunch of examples which I probably shouldn’t talk about, but an example close to home for me comes in cider. In the UK, cider is a fermented alcoholic drink made from apples, and as a craft cidermaker of many years standing I have a good grasp of its vocabulary. The accepted spelling is “Cider”, but there’s an alternate spelling of “Cyder” used by some commercial producers of the drink. It doesn’t take long to realise that online, hardly anyone uses cyder with a Y, and thus pages concentrating on that word will do less well than those talking about cider.

A graph of the word football versus the word soccer in British news.
We Brits rarely use the word “soccer” unless there’s a story about the Club World Cup in America.

I started to build software to analyse language around a given topic, with the aim of discerning the metaphorical cider from the cyder. It was a great surprise a few years later to discover that I had invented for myself the already-existing field of computational linguistics, something that would have saved me a lot of time had I known about it when I began. I was taking a corpus of text and computing the frequencies and collocates (words that appear alongside each other) of the words within it, and from that I could quickly see which wording mattered around a subject, and which didn’t. This led seamlessly to an interest in what the same process would look like for news data with a time axis added, so I created a version which harvested its corpus from RSS feeds. Thus began my decades-long project.

Continue reading “Crunching The News For Fun And Little Profit”

PIC Burnout: Dumping Protected OTP Memory In Microchip PIC MCUs

Normally you can’t read out the One Time Programming (OTP) memory in Microchip’s PIC MCUs that have code protection enabled, but an exploit has been found that gets around the copy protection in a range of PIC12, PIC14 and PIC16 MCUs.

This exploit is called PIC Burnout, and was developed by [Prehistoricman], with the cautious note that although this process is non-invasive, it does damage the memory contents. This means that you likely will only get one shot at dumping the OTP data before the memory is ‘burned out’.

The copy protection normally returns scrambled OTP data, with an example of PIC Burnout provided for the PIC16LC63A. After entering programming mode by setting the ICSP CLK pin high, excessively high programming voltage and duration is used repeatedly while checking that an area that normally reads as zero now reads back proper data. After this the OTP should be read out repeatedly to ensure that the scrambling has been circumvented.

The trick appears to be that while there’s over-voltage and similar protections on much of the Flash, this approach can still be used to affect the entire flash bit column. Suffice it to say that this method isn’t very kind to the Flash memory cells and can take hours to get a good dump. Even after this you need to know the exact scrambling method used, which is fortunately often documented by Microchip datasheets.

Thanks to [DjBiohazard] for the tip.

screenshot of C programming on Macintosh Plus

Programming Like It’s 1986, For Fun And Zero Profit

Some people slander retrocomputing as an old man’s game, just because most of those involved are more ancient than the hardware they’re playing with. But there are veritable children involved too — take the [ComputerSmith], who is recreating Conway’s game of life on a Macintosh Plus that could very well be as old as his parents. If there’s any nostalgia here, it’s at least a generation removed — thus proving for the haters that there’s more than a misplaced desire to relive one’s youth in exploring these ancient machines.

So what does a young person get out of programming on a 1980s Mac? Well, aside from internet clout, and possible YouTube monetization, there’s the sheer intellectual challenge of the thing. You cant go sniffing around StackExchange or LLMs for code to copy-paste when writing C for a 1986 machine, not if you’re going to be fully authentic. ANSI C only dates to 1987, after all, and figuring out the quirks and foibles of the specific C implementation is both half the fun, and not easily outsourced. Object Pascal would also have been an option (and quite likely more straightforward — at least the language was clearly-defined), but [ComputerSmith] seems to think the exercise will improve his chops with C, and he’s likely to be right. 

Apparently [ComputerSmith] brought this project to VCS Southwest, so anyone who was there doesn’t have to wait for Part 2 of the video to show up to see how this turns out, or to snag a copy of the code (which was apparently available on diskette). If you were there, let us know if you spotted the youngest Macintosh Plus programmer, and if you scored a disk from him.

If the idea of coding in this era tickles the dopamine receptors, check out this how-to for a prizewinning Amiga demo.  If you think pre-ANSI C isn’t retro enough, perhaps you’d prefer programming by card?

Continue reading “Programming Like It’s 1986, For Fun And Zero Profit”