Neural Network Gimbal Is Always Watching

[Gabriel] picked up a GoPro to document his adventures on the slopes and trails of Montreal, but quickly found he was better in front of the camera than behind it. Turns out he’s even better seated behind his workbench, as the completely custom auto-tracking gimbal he came up with is nothing short of a work of art.

There’s quite a bit going on here, and as you might expect, it took several iterations before [Gabriel] got all the parts working together. The rather GLaDOS-looking body of the gimbal is entirely 3D printed, and holds the motors, camera, and a collection of ultrasonic receivers. The Nvidia Jetson TX1 that does the computational heavy lifting is riding shotgun in its own swanky looking 3D printed enclosure, but [Gabriel] notes a future revision of the hardware should be able to reunite them.

In the current version of the system, the target wears an ultrasonic emitter that is picked up by the sensors in the gimbal. The rough position information provided by the ultrasonics is then refined by the neural network running on the Jetson TX1 so that the camera is always focused on the moving object. Right now the Jetson TX1 gets the video feed from the camera over WiFi, and commands the gimbal hardware over Bluetooth. Once the Jetson is inside the gimbal however, some of the hardware can likely be directly connected, and [Gabriel] says the ultrasonics may be deleted from the design completely in favor of tracking purely in software. He plans on open sourcing the project, but says he’s got some internal house keeping to do before he takes the wraps off it.

From bare bones to cushy luxury, scratch-built camera gimbals have become something of a right of passage for the photography hacker. But with this project, it looks like the bar got set just a bit higher.

16 thoughts on “Neural Network Gimbal Is Always Watching

  1. Cool! It would be even cooler if one used the camera feed and an fpga realtime to track the person, but the most obvious choice (like mine) is probably not the best. An ultrasonic tracker is brilliant, and somthing I haven’t seen much elsewhere. Kudos!

  2. Impressive, great job. Thanks for sharing and sweet videography. I’m thinking modifying something like this for sound/radio direction finding or even you got me thinking a broader range ultrasonic detector. Is the ultrasonic system you are using ~40kHz?.

    1. Thanks ! The ultrasound tracking is custom but based on the sfr-4. If I remember correctly I centered the filter to intercept most at 36kHz even if the emitters output at 40kHz. This value is from pure experimentation, but I believe it has something to do with sound signal frequency decreasing over distance since. You could definitely use my receiver design to locate sound direction but it my be tricky to filter depending on your frequency.

      1. Do you have any references to the filtering process. I would be using a range, not single frequency so would need an additional process. I didn’t read up on in detail enough yet how you did, do you have a reference maybe? I’m used to using libraries of data to relate to for identification (http://www.americanlaboratory.com/1413-Issues/37116-October-2008/) so I was thinking coupling with say Raven or Raven Lite (http://www.birds.cornell.edu/brp/raven/RavenOverview.html) for the sound version and the RF version with scanner frequency library for identification. I’d also want trajectory information also recorded.

        Your design is way neat.

        1. The filtering is done by hardware. Look into the Sr-04 circuit. Its made of Op-amps filter that you adjust by changing the resistors or capacitors value, I dont think its what you are looking for.

          1. No, I am more into the gimbal design and do appreciate your replies still as I enjoy learning new ways. I was wondering though what the frequency range is with those HC-SR04’s and I’ve read where they do have a small span and a peak at around 57Hz they resonate with also. Is the send and receive frequency different on those? I’ve read they send at 32kHz. I have to look for the transducer specification to see the tolerance for the specific component. I assume lower resistance for the different range maybe to power more and does the capacitor filters low end, so sets filter lower frequency? My guess was variation with manufacturing compared to specs. Thanks for explaining. I found this site that helped also (or confused as notes 18kHz centered). https://uglyduck.ath.cx/ep/archive/2014/01/Making_a_better_HC_SR04_Echo_Locator.html

  3. I was looking forward to reading some more details but among the many links that were titled as if they were related to the project, I could not find anything. Dear Tom, please use the hyperlink titles wisely.

  4. Happens with proprietary designs if not open source. I too can only release so much of my systems work also as I was assaulted, armed robbed, shot and my truck stolen that had my mobile office and hard drives containing my work from Perrigo (I found recently Perigo means danger) that I noted above. I’m thinking it’s not just the Israeli network involved too to add polysubstance deadly force potential with Brazil. I’m still gang stalked by the pedo junkies mass murderers that use ELINT/SIGINT/RINT/ES/TS/EW. Great to know there are many that still can think logically left in the World. Be careful in Michigan… pan troglodyte humanoid that looks and mocks human is everywhere.

    The links to some other projects that are similar can be found when reviewing in detail the links provided in the article. I had to hunt around looking at video comments too. Reminds me of inductive logic versus deductive.

  5. Brilliantly done. Don’t forget that a well framed shot of this kind of motion, has the camera framed slightly differently. Instead of centering the subject, the camera should lead the figure so there is more space in front of the subject than behind. I.e. Showing where the subject will go not where its been. Its a bit tricky as you can’t reconstruct the surface to determine that (yet) – all you can do is look at the motion tracking and see the framing in a 2D way. This wil make an acceptable improvement.

    If you want to try – and you’re using openCV (I hope so) or https://openmv.io then you can try using the optical flow farneback or pyramid mechanisms. These work well to find motion directions. and should enable you to lead the action slightly in X and Y and improve your framing.
    If you sample discreetly around the current aim point, and feed them to a simple neural net, then you can improve this further to guestimate the 3D pose and make an even better framing choice. (The neural net means you only need to train it with samples and not work out the almost intractable math.)

    I’m hoping you release this – excellent job.

    1. I did not get that far, designing the multiple iterations was time consuming. I actually spent a year exploring ultrasounds and GPS, but integrated vision only in the last month of the project. Smoothness of the tracking is still the main challenge.

      I stoped working on it after my ski trip last winter. The next version needs a full body redesign and this has been putting me off. But posting the vid has me motivated. When I get time I will definitely explore the openMv, it looks amazing.

      1. ping me on https://hackaday.io/Neon22 if you think I can help. I have some python code for optical flow using openCV if you’re using a high end CPU like an rpi. If you’re thinking embedded controllers then plenty of examples of openMV doing similar tracking but using diff algorithms on their site. Let me know. Happy to help with fascinating project.

  6. I noticed on a November 2015 post that “Opencv not fully ported to that specific GPU. Looks good on the surface but checking the details unveils the most of the calls still use CPU.” Do you know if Opencv has updated for the specific GPU?

Leave a Reply to neon22Cancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.