How To Roll Your Own Custom Object Detection Neural Network

Real-time object detection, which uses neural networks and deep learning to rapidly identify and tag objects of interest in a video feed, is a handy feature with great hacker potential. Happily, it’s also possible to make customized CNNs (convolutional neural networks) tailored for one’s own needs, and that process just got easier thanks to some new documentation for the Vizy “AI camera” by Charmed Labs.

Raspberry Pi-based Vizy camera

Charmed Labs has been making hacker-friendly machine vision devices for a long time, and the Vizy camera impressed us mightily when we checked it out last year. Out of the box, Vizy has a perfectly functional object detector application that runs locally on the device, and can detect and tag many common everyday objects in real time. But what if that default application doesn’t quite meet one’s project needs? Good news, because it’s possible to create a custom-trained CNN, and that process got a lot more accessible thanks to step-by-step examples of training a model to recognize hands doing rock-paper-scissors.

Person and cat with machine-generated tags identifying them
Default object detection works well, but sometimes one needs custom results.

The basic process is this: Start with a variety of images that show the item of interest. Then identify and label the item of interest in each photo. These photos (a “training set”) are then sent to Google Colab, which will be used to generate a neural network. The resulting CNN model can then be downloaded and used, to see how well it performs.

Of course things rarely work perfectly the first time around, so at this point it’s pretty common for some refinement to be needed to increase accuracy. Luckily there are a number of tools to help do this without creating a new model from scratch, so it’s just a matter of tweaking until things perform acceptably.

Google Colab is free and the resulting CNNs are implemented in the TensorFlow Lite framework, meaning it’s possible to use them elsewhere. So if custom object detection has been holding up a project idea of yours, this might be what gets you over that hump.

Vizy “AI Camera” Wants To Make Machine Vision Less Complex

Vizy, a new machine vision camera from Charmed Labs, has blown through their crowdfunding goal on the promise of making machine vision projects both easier and simpler to deploy. The camera, which starts around $250, integrates a Raspberry Pi 4 with built-in power and shutdown management, and comes with a variety of pre-installed applications so one can dive right in.

The Sony IMX477 camera sensor is the same one found in the Raspberry Pi high quality camera, and supports capture rates of up to 300 frames per second (under the right conditions, anyway.) Unlike the usual situation faced by most people when a Raspberry Pi is involved, there’s no need to worry about adding a real-time clock, enclosure, or ensuring shutdowns happen properly; it’s all taken care of.

‘Birdfeeder’ application can automatically identify and upload images of visitors.

Charmed Labs are the same folks behind the Pixy and Pixy 2 cameras, and Vizy goes further in the sense that everything required for a machine vision project has been put onboard and made easy to use and deploy, even the vision processing functions work locally and have no need for a wireless data connection (though one is needed for things like automatic uploading or sharing.) For outdoor or remote applications, there’s a weatherproof enclosure option, and wireless connectivity in areas with no WiFi can be obtained by plugging in a USB cellular modem.

A few of the more hacker-friendly hardware features are things like a high-current I/O header and support for both C/CS and M12 lenses for maximum flexibility. The IR filter can also be enabled or disabled via software, so no more swapping camera modules for ones with the IR filter removed. On the software side, applications are all written in Python and use open software like Tensorflow and OpenCV for processing.

The feature list looks good, but Vizy also seems to have a clear focus. It looks best aimed at enabling projects with the following structure:

Detect Things (people, animals, cars, text, insects, and more) and/or Measure Things (size, speed, duration, color, count, angle, brightness, etc.)

Perform an Action (for example, push a notification or enable a high-current I/O) and/or Record (save images, video, or other data locally or remotely.)

The Motionscope application tracking balls on a pool table. (Click to enlarge)

A good example of this structure is the Birdfeeder application which comes pre-installed. With the camera pointed toward a birdfeeder, animals coming for a snack are detected. If the visitor is a bird, Vizy identifies the species and uploads an image. If the animal is not a bird (for example, a squirrel) then Vizy can detect that as well and, using the I/O header, could briefly turn on a sprinkler to repel the hungry party-crasher. A sample Birdfeeder photo stream is here on Google Photos.

Motionscope is a more unusual but very interesting-looking application, and its purpose is to capture moving objects and measure the position, velocity, and acceleration of each. A picture does a far better job of explaining what Motionscope does, so here is a screenshot of the results of watching some billiard balls and showing what it can do.