OpenCV Spreads Smart Camera Joy To See Ideas Come To Life

Do you have a great application for computer vision, but couldn’t spare the cost of hardware needed to build it? Or perhaps you just need a deadline to pull you away from endless doom scrolling? Either way, the OpenCV team wants you to enter their OpenCV AI Competition 2021 and they’re willing to pitch in hardware to make it happen.

This competition is part of OpenCV’s 20th anniversary celebration, and the field of machine vision has changed a lot in those two decades. OpenCV started within Intel harnessing power of their high end CPUs, but today the excitement is around specialized acceleration hardware for vision processing. Which is why OpenCV put their support and lent their name to the OpenCV AI Kit (OAK) Kickstarter we covered a few months ago. Since then, the hardware was produced and starting to arrive in project backer’s hands. (Barring pandemic-related shipping restrictions…)

This shiny new hardware is the competition’s focus. Phase one solicits team proposals for putting an OAK-D’s power to novel use. University teams may have up to ten members, general teams are limited to four. Each team’s geographic home will put them in one of six global regions. Proposals must be submitted by January 27th, 2021. By February 11th, judges will select the best twenty-five general and ten university team proposals from each region, and every member of the team gets an OAK-D unit to turn their idea into reality by phase two deadline of June 27th. That’s up to 1,200 OAK-D modules available to anyone who can convince the judges they have a great idea and they are capable of bringing it to fruition. Is that you? Of course it is!

Teams will also receive additional resources such as an allotment of cloud compute credits to train their models, and naturally all tutorials and sample code released as part of OAK Kickstarter. No explicit resource for project team organization is mentioned, but of course our own Hackaday.io is available to support you. Best of luck to everyone who enters and we look forward to seeing all the projects this contest will bring to life.

Hands-Free Page Turning

For people who can’t lift a finger to turn the page on their ebooks, a solution is at hand. Seoul based technology company Visual Camp has adapted their eye tracking algorithms to an ebook reader. (Video, embedded below.) Reportedly this is the first time an ebook reader has been so equipped.

If your eye lingers on the page turn button, it will turn the page. While this particular application seems innocuous, some of the other applications being touted seem a little contrived if not invasive. For example, applying gaze analysis while you are reading a book, they claim to be able to make targeted recommendations for other books.

We’ve discussed eye tracking devices before, but they have utilized hardware. Visual Camp claims their AI-based technology only requires a color camera and can be integrated into existing camera-equipped devices, such an this ebook reader. They also offer a SDK for developers who want to add eye tracking control into their apps. Eye tracking is hard, though, and the devil is in the details. It’d be neat to see what they’re up to.

Continue reading “Hands-Free Page Turning”

Giving Blind Runners Independence With AI

Being able to see, move, and exercise independently is something most of us take for granted. [Thomas Panek] was an avid runner before losing his sight due to a genetic condition, and had to rely on other humans and guide dogs to run again. After challenging attendants at a Google hackathon, Project Guideline was established to give blind runners (or walkers) independence from a cane, dog or another human, while exercising outdoors. Using a smartphone with line following AI software, and bone conduction headphones, users can be guided along a path with a line painted on it. You need to watch the video below to get a taste of just how incredible it is for the users.

Getting a wheeled robot to follow a line is relatively simple, but a running human is by no means a stable sensor platform. At the previously mentioned hackathon, developers put together a rough proof of concept with a smartphone, using its camera to recognize a painted line on the ground and provide left/right audio cues.  As the project developed, the smartphone was attached to a waist belt and bone conduction headphones were used,  which don’t affect audio situational awareness as much as normal headphones.

The shaking and side to side movement of running, and varying light conditions and visual obstructions in the outdoors made the problem more difficult to solve, but within a year the developers had completed successful running tests with [Thomas] on a well-lit indoor track and an outdoor pedestrian path with a temporary line. For the first time in 25 years, [Thomas] was able to run independently.

While guide dogs have proven effective for both daily life and running, they cost approximately $60000 over an average working life of 8 years, putting them out of reach of many sight-impaired people around the world. Project Guideline is still in the early stages, and real-world problems like obstacles and traffic still need to be addressed, but there is massive potential.

Continue reading “Giving Blind Runners Independence With AI”

Still Got Film To Scan? This Lego And Raspberry Pi Scanner Is For You

There was a time during the early years of mass digital photography, when a film scanner was a common sight. A small box usually connected to a USB port, it had a slot for slides or negatives. In 2020 they’re  a rare breed, but never fear! [Bezineb5] has a solution in the shape of an automated scanner using a Radpberry Pi and a mechanism made of Lego.

The Lego mechanism is a sprocket feeder that moves the film past the field of view from an SLR camera. The software on the Pi runs in a Docker container, and features a machine learning approach to spotting frame boundaries. This is beyond the capabilities of the Pi, so is offloaded to a Google Coral accelerator.

The whole process is automated with the Pi controlling not only the Lego but also the camera, to the extent of retrieving the photos from it to the Pi. There’s a smart web interface to control everything, making the process — if you’ll excuse the pun — a snap. There’s a video of it in action, that you can see below the break.

We’ve featured many film scanner projects over the years, one that remains memorable is this 3D printed lens mount.

Continue reading “Still Got Film To Scan? This Lego And Raspberry Pi Scanner Is For You”

The Protein Folding Break-Through

Researchers at DeepMind have proudly announced a major break-through in predicting static folded protein structures with a new program known as AlphaFold 2. Protein folding has been an ongoing problem for researchers since 1972. Christian Anfinsen speculated in his Nobel Prize acceptance speech in that year that the three-dimensional structure of a given protein should be algorithm determined by the one-dimensional DNA sequence that describes it. When you hear protein, you might think of muscles and whey powder, but the proteins mentioned here are chains of amino acids that fold into complex shapes. Cells use these proteins for almost everything. Many of the enzymes, antibodies, and hormones inside your body are folded proteins. We’ve discussed why protein folding is important as well covered recent advancements in cryo-electron microscopy used to experimentally determine the structure of folded proteins.

The shape of proteins largely controls their function, and if we can predict their shape then we get much closer to predicting how they interact. While AlphaFold 2 just predicts the static state, the sheer number of interactions that can change a protein, dynamic protein structures are still out of reach. The technical achievement of DeepMind is not to be understated. For a typical protein, there are an estimated 10^300 different configurations.

Out of the 180 million protein sequences in the Protein database, only 170,000 have had their structures identified. Technologies like the cryo-electron microscope make the process of mapping their structure easier, but it is still complex and tedious to go from sequence to structure. AlphaFold 2 and other folding algorithms are tested against this 170,000 member corpus to determine their accuracy. The previous highest-scoring algorithm of 2016 had a median global distance test (GDT) of 40 (0-100, with 100 being the best) in the most difficult category (free-modeling). In 2018, AlphaFold made waves by pushing that up to the high 50’s. AlphaFold 2 brings that GDT up to 87.

At this point in time, it is hard to determine what sort of effects this will have on the drug industry, healthcare, and society in general. Research has always been done to create the protein, identify what it does, then figure out its structure. AlphaFold 2 represents an avenue towards doing that whole process completely backward. Whether the next goal is to map all the proteins encoded in the human genome or find new, more effective drug treatments, we’re quite excited to see what becomes of this landmark breakthrough.

Continue reading “The Protein Folding Break-Through”

“Enhance” Is Now A Thing, But Don’t Believe What You See

It was a trope all too familiar in the 1990s — law enforcement in movies and TV taking a pixellated, blurry image, and hitting the magic “enhance” button to reveal suspects to be brought to justice. Creating data where there simply was none before was a great way to ruin immersion for anyone with a modicum of technical expertise, and spoiled many movies and TV shows.

Of course, technology marches on and what was once an utter impossibility often becomes trivial in due time. These days, it’s expected that a sub-$100 computer can easily differentiate between a banana, a dog, and a human, something that was unfathomable at the dawn of the microcomputer era. This capability is rooted in the technology of neural networks, which can be trained to do all manner of tasks formerly considered difficult for computers.

With neural networks and plenty of processing power at hand, there have been a flood of projects aiming to “enhance” everything from low-resolution human faces to old film footage, increasing resolution and filling in for the data that simply isn’t there. But what’s really going on behind the scenes, and is this technology really capable of accurately enhancing anything?

Continue reading ““Enhance” Is Now A Thing, But Don’t Believe What You See”

Robots Can Finally Answer, Are You Talking To Me?

Voice Assistants, love them, or hate them, are becoming more and more commonplace. One problem for voice assistants is the situation of multiple devices listening in the same place. When a command is given, which device should answer? Researchers at CMU’s Future Interfaces Group [Karan Ahuja], [Andy Kong], [Mayank Goel], and [Chris Harrison] have an answer; smart assistants should try to infer if the user is facing the device they want to talk to. They call it direction-of-voice or DoV.

Currently, smart assistants use a simple race to see who heard it first. The reasoning is that the device you are closest to will likely hear it first. However, in situations with echos or when you’re equidistant from multiple devices, the outcome can seem arbitrary to a user.

The implementation of DoV uses an Extra-Trees Classifier from the python sklearn toolkit. Several other machine learning algorithms were considered, but ultimately efficiency won out and Extra-Trees was selected. Another interesting facet of the research was determining what facing really means. The team had humans ‘listeners’ stand in for smart assistants.  A ‘talker’ would speak the key phrase while the ‘listener’ determined if the talker was facing them or not. Based on their definition of facing, the system can determine if someone is facing the device with 90% accuracy that rises to 93% with per-room calibration.

Their algorithm as well as the data they collected has been open-sourced on GitHub. Perhaps when you’re building your own voice assistant, you can incorporate DoV to improve wake-word accuracy.

Continue reading “Robots Can Finally Answer, Are You Talking To Me?”