Blog Title Optimizer Uses AI, But How Well Does It Work?

[Max Woolf] sometimes struggles to create ideal headlines for his blog posts, and decided to apply his experience with machine learning to the problem. He asked: could an AI be trained to optimize his blog titles? It is a fascinating application of natural language processing, and [Max] explains all about what it does and how it works.

The machine learning framework [Max] uses is GPT-3, a language model that works with natural-seeming human language that is capable of being tweaked in different ways. [Max] uses OpenAI’s GPT-3 API (which, by the way, is much easier to experiment with than one might think) and here is the basic workflow for his title optimizer:

  1. The optimizer takes as input a blog post title to optimize.
  2. OpenAI’s pre-trained GPT-3 engine is used to generate six alternate titles.
  3. For each of those alternate titles, a fine-tuned version of GPT-3 is consulted to judge how “good” they are based on custom training data. (“Good” in this context means “similar to titles of successful submissions on Hacker News“, but more on that in a moment.)
  4. Print the results.

Continue reading “Blog Title Optimizer Uses AI, But How Well Does It Work?”

Hackaday Links Column Banner

Hackaday Links: April 8, 2018

SiFive raised $50 Million in funding. SiFive is a semiconductor working on two fronts: they want to democratize silicon prototyping, and they’re the people making the HiFive series of microcontrollers and SoCs. The HiFives are built on the RISC-V instruction set, a Big-O Open instruction set for everything from tiny microcontrollers to server CPUs. With RISC-V, you’re not tied to licensing from ARM or their ilk. Recently SiFive introduced an SoC capable of running Linux, and the HiFive 1 is a very fast, very capable microcontroller that’s making inroads with Nvidia and Western Digital. The new round of funding is great news for anyone who wants Open Source hardware, and the silicon prototyping aspect of it is exceptionally interesting. Great news for SiFive.

Guess what’s in just a few weekends? The Vintage Computer Festival Southeast. The VCFSE is Hotlanta’s own vintage computer festival, with a whole host of speakers, exhibits, and consignment to tickle those vintage dopamine receptors. On deck for the speakers is [Michael Tomczyk], one of the people responsible for the VIC-20, and [Scott Adams], no the other [Scott Adams], creator of adventure-style games for personal computers but not that adventure-style game. The exhibits will include Japanese retro computers, simulating an ENIAC and a mechanical keyboard meetup. If you’re around Georgia, this is an event worth attending.

Conference season is just around the corner, and you know what that means. It’s time to start ramping up for #badgelife. What is badgelife? It’s a hardware demoscene of electronic conference badges. This year, the badgelife scene has stumbled upon something everyone can get in on. Add-ons! They’re electronic hats (or shields, or capes) for all the badges. Physically, it’s a 2×2 pin header. Electronically, it’s power, ground and I2C. Want to prototype your own add-on? Good news, there’s a development board.

The Titius-Bode law states the semi-major axes of planets follow a geometric progression. The (simplified, incorrect) demonstration of this law states Mercury orbits at 0.25 AU, Venus at 0.5 AU, Earth at 1 AU, Mars at 2 AU, and continues to the outer planets. The Titius-Bode law is heavily discredited in the planetary science community, and any paper, talk, or manuscript is rejected by scientific editors out of hand. The Titius-Bode law is the planetary science equivalent of flat Earth conspiracy theories and Nazi moon bases; giving any consideration to the idea confirms you’re a moron. This week, some consulting firm posted something that is the Titius-Bode law on their blog. Why? So it could be submitted to Hacker News for that sweet SEO. This submission was upvoted to the top position, and is a wonderful springboard to argue an interesting point on media literacy. I posit the rise of news aggregators (facebook, twitter, digg, reddit, and HN), is the driving force behind ‘fake news’ as lay people become the gatekeepers. Prove me wrong.

The Department of Homeland Security has confirmed there are cell-site simulators (Stingrays, IMSI-catchers, or otherwise known as your own private cell phone base station) around Washington DC. It’s unknown who is operating these simulators, or even where they are. There are two things to read between the lines with this information: Duh, there are rogue Stingrays in DC. Holy crap duh. I bet there are also some around midtown Manhattan. You can buy the stuff to do this on eBay. Personally, I’ve found half a dozen Stingrays or other rogue cell stations this year (guess where?). Second, why is this a news item now? Is this a signal that the DHS will start clamping down on stuff you can buy on eBay? Hop to it, people; cellular hardware is a great way to make a liquid nitrogen generator.

How Hacker News Page Rankings Really Work

Page rankings are the secret sauce of websites that automatically aggregate user submissions. The basic formula used by Hacker News was published a few years back. But there are several pieces of the puzzle that are missing from that specification. [Ken Shirriff] recently published an analysis that digs deeper to expose the article penalization system used by Hacker News’ ranking engine.

One might assume that the user up and down votes are what determine a page’s lifespan on the front page. But it turns out that a complex penalization system makes a huge difference. It takes into account keywords, and domain names but also weighs controversy. It’s a bit amusing to note that this article on the topic was itself penalized, knocking it off of the front page.

You can get the full details of the system from his post, but we found his investigation methods to be equally interesting. He scraped two pages of the news feed every minute using Python and the Beautiful Soup package (a pretty common scraping practice). This data set allowed him to compare the known algorithm with actual results. What was left were a set of anomalies that contained enough sense for him to reverse engineer the unpublished formulas being used.