Large language models (LLMs) are wholly dependent on the quality of the input data with which these models are trained. While suggestions that people eat rocks are funny to you and me, in the case of LLMs intended to help out medical professionals, any false claims or statements dripping out of such an LLM can have dire consequences, ranging from incorrect diagnoses to much worse. In a recent study published in Nature Medicine by [Daniel Alexander Alber] et al. the ease with which this data poisoning can occur is demonstrated.
According to their findings, only 0.001% of training tokens have to be replaced with medical misinformation to order to create models that are likely to produce medically erroneous statement. Most concerning is that such a corrupted model isn’t readily discovered using standard medical LLM benchmarks. There are filters for erroneous content, but these tend to be limited in scope due to the overhead. Post-training adjustments can be made, as can the addition of RAG, but none of this helps with the confident bull excrement due to corruption.
The mitigation approach that the researchers developed cross-references LLM output against biomedical knowledge graphs, to reduce the LLM mostly for generating natural language. In this approach LLM outputs are matched against the graphs and if LLM ‘facts’ cannot be verified, it’s marked as potential misinformation. In a test with 1,000 random passages detected issues with a claimed effectiveness of 91.9%.
Naturally, this does not guarantee that misinformation does not make it past these knowledge graphs, and largely leaves the original problem with LLMs in place, namely that their outputs can never be fully trusted. This study also makes it abundantly clear how easy it is to corrupt an LLM via the input training data, as well as underlining the broader problem that AI is making mistakes that we don’t expect.
A really good reason why LLMs should not be used in any application where they could put lives in danger. Also, the preponderance of “confident bull excrement” AND the simultaneous preponderance of equally confident bull excrement saying the exact opposite in the politicosphere is probably creating schizophrenic AIs. Don’t trust anything without a face. Or with the wrong number of fingers.
Humans are also vulnerable to this.
But we can get useful work out of humans even if they are wrong about stuff.
LLM’s greatest contribution thus far is to sow mistrust around its outputs by hallucinating or miscorrelating information. Natural language is not merely textual, LLM’s use a logic which is correct within its own black box which has no useful meaning in actual reality. The reason computer code can be synthesized better by an LLM, than making a medical diagnosis, is programming languages are far less expressive and far more specific in what they communicate
“only 0.001% of training tokens have to be replaced”
good luck sneaking 100 bibles worth of poison in something like GPT4 to have an effect…