Detecting Machine-Generated Content: An Easier Task For Machine Or Human?

In today’s world we are surrounded by various sources of written information, information which we generally assume to have been written by other humans. Whether this is in the form of books, blogs, news articles, forum posts, feedback on a product page or the discussions on social media and in comment sections, the assumption is that the text we’re reading has been written by another person. However, over the years this assumption has become ever more likely to be false, most recently due to large language models (LLMs) such as GPT-2 and GPT-3 that can churn out plausible paragraphs on just about any topic when requested.

This raises the question of whether we are we about to reach a point where we can no longer be reasonably certain that an online comment, a news article, or even entire books and film scripts weren’t churned out by an algorithm, or perhaps even where an online chat with a new sizzling match turns out to be just you getting it on with an unfeeling collection of code that was trained and tweaked for maximum engagement with customers. (Editor’s note: no, we’re not playing that game here.)

As such machine-generated content and interactions begin to play an ever bigger role, it raises both the question of how you can detect such generated content, as well as whether it matters that the content was generated by an algorithm instead of by a human being.

Continue reading “Detecting Machine-Generated Content: An Easier Task For Machine Or Human?”