The device is built around Google’s AIY Voice Kit, which consists of a Raspberry Pi with some additional hardware and software to enable it to process voice queries. [Liz] combined this with a Raspberry Pi camera and the Google Cloud Vision API. This allows WhatIsThat to respond to users asking questions by taking a photo, and then identifying what it sees in the frame.
It may seem like a frivolous project to those with working vision, but there is serious potential for this technology in the accessibility space. The device can not only describe things like animals or other objects, it can also read text aloud and even identify logos. The ability of the software to go beyond is impressive – a video demonstration shows the AI correctly identifying a Boston Terrier, and attributing a quote to Albert Einstein.
Artificial intelligence has made a huge difference to the viability of voice recognition – because it’s one thing to understand the words, and another to understand what they mean when strung together. Video after the break.
The current best attempts are collected in a Google Sheets document. So far, there have been few competitors but we expect to see more activity in future. The current rules for world record competition require original floppy and CD-ROM images to be used, but there are no limits on hardware, so records should tumble as time goes on. All the top times have been completed in virtual machines, but we’d love to see an attempt made on raw hardware.
It all kicked off when [oscareczek] grew tired of trying to compete in traditional gaming speedruns, so invented a new category instead. Competition has already come a long way from that original 4 minute time, and competitors are now considering advanced techniques such as RAM disks to speed their runs. All keystrokes are by hand at the moment, but we could see a tool-assisted competition starting up in future.