Estimating the match between two motion sequences captured using Xbox Kinect


Two motion sequences will be captured using the Kinect. The sequences can be of different durations. We then try to match these two sequences and give a number which in some way gives an idea of the matching of the two sequences.


1 Capturing the motion

We use the open-NI demos for the xbox kinect. One of the demos v.i.z. UserTracker- NI tracks the skeleton of a human. So we have modi ed its source code to output the skeleton parameters when the recording starts. For each frame we output the global positions of all joints in a text file.

2 Matching sequences

The motivation of the solution is from the solution to the correspondence problem. We will rst map our problem to the correspondence problem and then use the dynamic programming solution to solve our problem efficiently.

2.1 The simple solution

Before going into the actual solution lets see why the simple solution fails. Just like to nd the matching between two pixels we take the least squared difference why not take some kind of least squared di erence of the parameters for each frame. This does not work because of following
1. There might be frames at the beginning or at the end where the actual motion has not started.
2. Some part might be missing in the second motion.
In each of the above cases (or combination) it is non trivial to find the correspondence between the frames.

2.2 Modified Solution

Now the mapping. Lets map a frame to a pixel. Just like a pixel can be specified by some values like r,g,b,a etc each frame can be speci ed by some values viz. Joint angles, positions etc. Next we map space to time. In stereo problem we had a sequence of pixels in space which we had to match to another sequence, here we have a sequence of frames in time. The stereo problem can already
handle sequences of di erent length (1 solved) and it can also handle occlusion (2 solved).

One of the important assumptions in stereo problem was ordering constraint i.e. pixels in the rst sequence have their corresponding pixels in same order in second. So we will have to make same assumption in our case i.e. the ordering of sub-parts of the motion remains the same.

After this the problem reduces to pretty much reduces to
1. There might be frames at the beginning or at the end where the actual motion has not started.
2. Choosing the correct cost function for matching and skipping a frame.

3 Applications

Matching two motions can have a wide range of applications. Say a dancer wants to match Hritik's move or a cricketer wants to perfect Sachin's shot, our application can give the extent of match between the two motions (provided we have motion captured Sachin/Hritik) and the dancer/cricketer can keep on practicing till they get almost a perfect match. Another more plausible example is gesture recognition. Now-a-days the technology is moving from wireless to gestures. We wish to command the machines just by doing some actions. Our application can match the users motion with a stored database of default gestures and identify the most probable one.

download link for the report