In a previous article, we talked about the idea of the invariant representation and theorized different ways of implementing such an idea in silicon. The hypothetical example of identifying a song without knowledge of pitch or form was used to help create a foundation to support the end goal – to identify real world objects and events without the need of predefined templates. Such a task is possible if one can separate the parts of real world data that changes from that which does not. By only looking at the parts of the data that doesn’t change, or are invariant, one can identify real world events with superior accuracy compared to a template based system.
Consider a friend’s face. Imagine they were sitting in front of you, and their face took up most of your visual space. Your brain identifies the face as your friend without trouble. Now imagine you were in a crowded nightclub, and you were looking for the same friend. You catch a glimpse of her from several yards away, and your brain ID’s the face without trouble. Almost as easily as it did when she was sitting in front of you.
I want you to think about the raw data coming off the eye and going into the brain during both scenarios. The two sets of data would be completely different. Yet your brain is able to find a commonality between the two events. How? It can do this because the data that makes up the memory of your friend’s face is stored in an invariant form. There is no template of your friend’s face in your brain. It only stores the parts that do not change – such as the distance between the eyes, the distance between the eye and the nose, or the ear and the mouth. The shape her hairline makes on her forehead. These types of data points do not change with distance, lighting conditions or other ‘noise’.
One can argue over the specifics of how the brain does this. True or not true, the idea of the invariant representation is a powerful one, and implementing such an idea in silicon is a worthy goal. Read on as we continue to explore this idea in ever deeper detail.