Web-enabled Kinect

December 6, 2011

There are Kinect hacks out there for robot vision, 3D scanners, and even pseudo-LIDAR setups. Until now, one limiting factor to these builds is the requirement for a full-blown computer on the device to deal with the depth maps and do all the necessary processing and computation. This doesn’t seem like much of a problem since [wizgrav] published Intrael, an HTTP interface for the Kinect.

[Eleftherios] caught up to [wizgrav] at his local hackerspace where he did a short tutorial on Intrael. [wizgrav]’s project provides each frame from the Kinect over HTTP wrapped up in JSON arrays. Everything a Kinect outputs aside from sound is now easily available over the Internet.

The project is meant to put computer vision outside the realm of desktops and robotic laptops and into the web. [wizgrav] has a few ideas on what his project can be used for, such as smart security cameras and all kinds of interactive surfaces.

After the break, check out the Intrael primer [wizgrav] demonstrated (it’s Greek to us, but there are subtitles), and a few demos of what Intrael ‘sees.’

[youtube=http://www.youtube.com/watch?v=sIXIGeJSuAg&w=470]

[youtube=http://www.youtube.com/watch?v=hqAyovdEQTE&w=470]

[youtube=http://www.youtube.com/watch?v=dwYfVjoTQXQ&w=470]

11 thoughts on “Web-enabled Kinect”

Matt says:

December 6, 2011 at 4:37 pm

So now we’re wrapping video in JSON in HTTP?
What is the world coming to…

Seriously, give the JSON a rest, folks!
I don’t think your HTTP server is going to be processing video in real time, so there’s no reason to not dump raw frames in a TCP stream.

Report comment

Reply
1. Matt says:
  
  December 6, 2011 at 4:39 pm
  
  Correction: Shouldn’t have trusted the hackaday summary.
  They’re shipping analyzed data from the frame over JSON over HTTP.
  
  So it’s maybe even slightly useful.
  But will most certainly require a computer to do the heavy lifting on the Kinect side. Not that that’s a bad thing.
  
  Just a bad summary here.
  
  Report comment
  
  Reply
  1. wizgrav says:
    
    December 7, 2011 at 4:04 am
    
    The computer needed for the heavy lifting should be no more than an ARM Cortex A8 @ 1Ghz. The client server approach also has the advantage of decoupling the box that handles the kinect and the one handling the output. It’s been tested and works great over wifi. This comes very handy since the kinect can track up to 10m away, the usb cable length is no longer an issue
    
    Report comment
    
    Reply
Josh says:

December 6, 2011 at 10:39 pm

Whats the difference between this and the work already done with the 6th sense computer? Not much IMO.

Report comment

Reply
1. Elefthterios Kosmas says:
  
  December 7, 2011 at 3:43 am
  
  Well this one is an app server that actually takes sensor data from a Kinect, processes it and transmits it via HTTP. SixthSense on the other hand is more of a gestural interface.
  
  Report comment
  
  Reply
Chris Allick says:

December 7, 2011 at 9:38 am

Here is the same thing done with processing and regular video:

http://badankles.com/?p=209

Report comment

Reply
1. wizgrav says:
  
  December 7, 2011 at 9:51 am
  
  Not quite, I think matt was right when he said that the summary is misleading. The images are not base64 encoded. They’re encoded as JPEGs and presented as a stream through a tag using a technique called MJPEG over http. You can read about it here
  
  http://en.wikipedia.org/wiki/Motion_JPEG
  
  The MJPEG stream and the one that delivers the data from the blob tracking are served from separate paths on the server. I like what you did though, I opted for lossy transmission for practical reasons. I also experimented with websockets for the JSON data delivery but got fed up with the protocol changing all the time. Intrael(optionally) supports Server Sent Events which is basically a one-way websocket, maybe that would be of interest to you as well.
  
  http://en.wikipedia.org/wiki/Server-sent_events
  
  Report comment
  
  Reply
  1. Chris Allick says:
    
    December 7, 2011 at 10:38 am
    
    sorry, i posted too quickly. i should have explained that it achieves similar objectives in that it streams videos over websockets from a device like a PS3 eye camera.
    
    as a side note though my understanding is that MJPEG is rather lossy and not a good format for this data. if you only cared about kinect, an ogg stream would be much better.
    
    for my purposes, i just wanted to stream video to an ipad. and the same technique can be used from an ipad app back to the web.
    
    Report comment
    
    Reply
  2. wizgrav says:
    
    December 7, 2011 at 11:29 am
    
    Yeah, a vorbis stream would achieve better compression ratio even though it would be lossy as well. Another reason MJPEG was chosen was the ease of implementation compared to regular video streaming. You just have to stream regular JPEGs with a text boundary between them. It’s much lighter on resources than normal video compression and the results are still usable as crop material. But if you want to further analyze the pixel data in the browser the best solution would be a lossless format like PNG which I also tested but the file sizes got pretty big. I’m still thinking about it though.
    
    Report comment
    
    Reply
  3. wizgrav says:
    
    December 7, 2011 at 11:34 am
    
    sorry I meant theora not vorbis
    
    Report comment
    
    Reply
Chris Allick says:

December 7, 2011 at 9:38 am

sorry, i should say using websockets.

Report comment

Reply