What is the essential element which separates a text written by a human being from a text which has been generated by an algorithm, when said algorithm uses a massive database of human-written texts as its input? This would seem to be the fundamental struggle which society currently deals with, as the prospect of a future looms in which students can have essays auto-generated from large language models (LLMs) and authors can churn out books by the dozen without doing more than asking said algorithm to write it for them, using nothing more than a query containing the desired contents as the human inputs.
Due to the immense amount of human-generated text in such an LLM, in its output there’s a definite overlap between machine-generated text and the average prose by a human author. Statistical methods of detecting the former are also increasingly hamstrung by the human developers and other human workers behind these text-generating algorithms, creating just enough human-like randomness in the algorithm’s predictive vocabulary to convince the casual reader that it was written by a fellow human.
Perhaps the best way to detect machine-generated text may just be found in that one quality that these algorithms are often advertised with, yet which they in reality are completely devoid of: intelligence.
Continue reading “Human-Written Or Machine-Generated: Finding Intelligence In Language Models”