Indexing Large Human-Motion Databases. VLDB 2004: pp 780-791

Eamonn Keogh, Themistoklis Palpanas,  Victor B. Zordan, Dimitrios Gunopulos, Marc Cardle


This page presents a set of example animations comparing the performance of scale-invariant matching versus Dynamic Time Warping (DTW) matching. Shrinking as well as stretching invariance are demonstrated.


Given a query motion capture sequence, our system finds the closest matches either using DTW or Scale-invariant matching in our motion capture database. Matching is carried out on the upper-body[1] only -therefore the leg motions should be ignored by the reader. The upper body motion is defined by a set of pre-defined bone lengths and a set of time-varying joint rotations. A joint rotation is defined by 3 Euler angle curves – rotation around X axis, Y axis and Z axis.


The distance between two character poses (i.e. upperbody configurations) is defined as the weighted Euclidean distance of all the joint orientations describing the upperbody. A weighted sum is used since some joints have more visual impact then others. For example, the shoulder rotation has more visual impact than the hand rotation.


Below are two extracts from an example animation:









Note that in all pose plots, to improve clarity the character’s global rotation and translation (defined at the hips) in cancelled out. This is because keeping the global rotation/translation would make visual comparisons difficult. Consequently, some motions might appear unnatural which is not the case when we reapply the global parameters.


Click on the following links below to view the example videos. The videos require the free DivX video codec to decompress and display correctly.


Example 1: Arm extension (please recall leg motions should be ignored)


Example 2: Sitting (please recall leg motions should be ignored)


Example 3: Standing up (please recall leg motions should be ignored)


Notice that the top matches under scaling are spatially closer to the query motion. The matches found with DTW accounts for small fluctuations in the time axis. However, DTW cannot capture better matches that are only slightly temporally longer/shorter, but spatially closer as demonstrated in our examples. In the top right plot, shorter matches have less animation frames than the query, and inversely for the longer matches.



All the test datasets used in these paper are also available here. Note that the exact queries and candidate sequences are available, so our experiments can be exactly reproduced.   




[1] the head, neck, shoulders, elbows, back, hands