During the slow first week of my internship at Microsoft this summer, I got bored so I visited the Microsoft Library. I picked up two books, The 20% Doctrine by Ryan Tate and Making Things See: 3D Vision with Kinect, Processing, Arduino, and Makerbot by Borenstein.
The combination was a fortuitous, considering the first book suggests that using approximately 1/5 of one’s time tinkering, goofing off, or breaking rules can boost creativity and productivity, and the latter is a book about goofing off with Kinect. Since I had been intrigued by Kinect after working on its hardware in the summer of 2012 and learning about Microsoft’s leverage PrimeSense’s technology to develop a versatile vision system at consumer prices, I dusted off my Kinect and started hacking around.
How It Works:
The Kinect works by capturing two video feeds, a regular color image and an IR image. The brightness of each pixel in the IR feed indicates how close the display object is, and color of the pixel is recorded as brightnesses of its red, blue, and green components of light. So the Kinect provides for its scene, 640 pixels wide and 480 tall, a matrix containing 4 values: the depth number and the three color numbers.
The language chosen by the MTS author for using this data, Processing is a programming framework layered over Java and commonly used for processing images. It transforms the numbers of the matrix into real images and provides a library of functions that can be applied to the data. Of course, the calling of the functions and processing, if you will, of data is done by writing a Processing program – called sketches -which are then executed using the Java drivers already installed on the computer. Below is an example of the sketch I worked through tonight.
This sketch displayed the depth image and overlaid a red dot on the closest point, shown below. Even though this sketch is fairly basic and straight out of the book, it’s really cool to be on the screen in full depth video.
The next chapter of MTS will begin working with skeletal tracking, for which I am very excited because of the opportunity of combining it with my interest in biomechanics and running. With the kinect and a laptop, I’m interested in working toward a tool to provide visual and analyzable data to a runner to show them how they run. Short of looking for research articles to validate the following statement, I think it’s safe to say visual feedback is one of the best ways to learn complex motor skills. It would be cool to record an athlete’s gait, process it, and display it side by side with an elite runner for comparison, all within a few minutes time and without the need for big labs and special equipment. Various people have dabbled in this space already: