[Max Woolf] sometimes struggles to create ideal headlines for his blog posts, and decided to apply his experience with machine learning to the problem. He asked: could an AI be trained to optimize his blog titles? It is a fascinating application of natural language processing, and [Max] explains all about what it does and how it works.
The machine learning framework [Max] uses is GPT-3, a language model that works with natural-seeming human language that is capable of being tweaked in different ways. [Max] uses OpenAI’s GPT-3 API (which, by the way, is much easier to experiment with than one might think) and here is the basic workflow for his title optimizer:
- The optimizer takes as input a blog post title to optimize.
- OpenAI’s pre-trained GPT-3 engine is used to generate six alternate titles.
- For each of those alternate titles, a fine-tuned version of GPT-3 is consulted to judge how “good” they are based on custom training data. (“Good” in this context means “similar to titles of successful submissions on Hacker News“, but more on that in a moment.)
- Print the results.
The custom training data in step 3 comes from bulk submission data from Hacker News, obtained via Google’s BigQuery service. [Max] separated Hacker News submissions into ‘good’ and ‘bad’ depending on how many points the submission ended up with. Step 3 simply asks GPT-3 to grade each potential headline based on this data. The hypothesis that a submission’s rating on Hacker News can be directly correlated to the quality of its headline is an interesting idea, and the Title Optimizer can be thought of as an experiment in seeing whether this idea can be applied in the other direction: making posts more successful with the help of a good headline.
So, does [Max] now just use the highest-scoring headlines for his blog posts and call it a day? Sadly, no. Many of the results aren’t terribly suitable for one reason or another. They may neglect to emphasize the right elements, or sound too much like clickbait, or are lacking in some other way.
The AI-generated headlines might be a mixed bag, but that doesn’t mean they are not useful. There is genuine variety in the machine-generated suggestions, and they provide useful inspiration even when none of the results themselves are a home run.
[Max]’s GPT-3 Blog Title Optimizer is here on GitHub if you’d like a closer look. It’s an interesting application of natural language AI, and is also a perfect example of how machine learning’s best creative results so often come from having a human in the loop.
“A robot just created this headline, now everybody is raging.”
“AI created these words, and you won’t believe what happened next.”
These came from my AI clickbait generator, which I use to train journalists.
Write your blog headlines effortlessly with this one simple trick.
Step #3 will blow you away!
Finally happened. 10 Minutes ago!
bleh.
” which I use to train journalists.”
It seems lot of mainstream journalists these days need (re)training, especially in ethics.
Cool! Computers doing stuff we are too dumb or too lazy to do ourselves, but the end product will be stuff we can present to others as if we did it ourselves. In other words: how far away are we now from building a real “homework machine”?
And how far are we from building a machine that can read these blogs (or it’s titles) so that I do not have to read them myself. That would make it full circle and effectively the most complicated useless machine ever.
Regarding the project, a fun experiment, but I’m confused. If you don’t want to write a blog or feel confident in writing one (or even a headline for it, why would you even try to start a blog). Now here it’s all about the experiment, but it’s a matter of time before this evolves into something that will hunt us all. And it’s too late to destroy this technology… ahhhh… we’re doomed… automatic realistic spam messages… automatic realistic news messages… automatic realistic telephone conversations selling me things I do no need (or I think I do not need). Ahhh… we’re doomed.
Sorry… I watched the Terminator movie yesterday, I’m still processing it.
Humans have been copying from each other without attribution since the beginning of time. They grab stuff from other people without thinking and just paste it as if it were their own.
So now you are upset because the computers that we trained to act like us are doing the same thing.
Neutral networks hate this one trick
You won’t believe what this AI was posting!
Interesting idea, I wonder how it works out compared to the more RNG method of seeking naming inspiration, as it seems like its still a great deal of human effort to get the sane high quality result.
And how quickly it will become a feedback loop – the model is trained on published works, the published works are using the model, perhaps lazily not even verifying the output makes sense, the model is now somewhat trained on self generated garbage..
GPT-3 has no sense of humor.