Although it’s generally accepted that synthesized voices which mimic real people’s voices (so-called ‘deepfakes’) can be pretty convincing, what does our brain really think of these mimicry attempts? To answer this question, researchers at the University of Zurich put a number of volunteers into fMRI scanners, allowing them to observe how their brains would react to real and a synthesized voices. The perhaps somewhat surprising finding is that the human brain shows differences in two brain regions depending on whether it’s hearing a real or fake voice, meaning that on some level we are aware of the fact that we are listening to a deepfake.
The detailed findings by [Claudia Roswandowitz] and colleagues are published in Communications Biology. For the study, 25 volunteers were asked to accept or reject the voice samples they heard as being natural or synthesized, as well as perform identity matching with the supposed speaker. The natural voices came from four male (German) speakers, whose voices were also used to train the synthesis model with. Not only did identity matching performance crater with the synthesized voices, the resulting fMRI scans showed very different brain activity depending on whether it was the natural or synthesized voice.
One of these regions was the auditory cortex, which clearly indicates that there were acoustic differences between the natural and fake voice, the other was the nucleus accumbens (NAcc). This part of the basal forebrain is involved in the cognitive processing of e.g. motivation, reward and reinforcement learning, which plays a key role in social, maternal and addictive behavior. Overall, the deepfake voices are characterized by acoustic imperfections, and do not elicit the same sense of recognition (and thus reward sensation) as natural voices do.
Until deepfake voices can be made much better, it would appear that we are still safe, for now.