Thanks for good questions. Previously you sent me one of your videos, so I can confirm that you certainly use good lighting and proper actor clothing.
Tracking is very sensitive to the accuracy of modeling of actor appearance (shape, color, scale etc.). Our current actor model has limited number of scale/shape settings and is not perfectly accurate. In particular, head and hands are a bit disproportionate. A better model requires serious re-design of actor model and re-design of tracking algorithm - a big change and a large amount of work (currently in progress).
Tracking accuracy and reliability may seriously vary for different actors. Some of our demos use custom/experimental models and obscure fine-tuning for particular actor. We still need to invent a more universal approach to actor shape modeling.
In current version, you can improve tracking reliability by playing with actor scale. Try adjusting actor scale based on good arms/hands modeling instead of good head modeling. Specifically, try fine-tuning actor scale based on actor pose with hands down (as opposed to T-pose). Keep in mind that good match of top-of-the-shoulders is more important then good match of top-of-the-head.
You should also consider upgrading to PlayStation Eye cameras (
http://www.newegg.com/Product/ProductLi ... ye&x=0&y=0). Latest release of iPi Recorder fully supports PSEye cameras. PSEye cameras have low-distortion optics and can achieve more accurate calibration. PSEye cameras can shoot at frame rates up to 60 FPS at 640x480. (But you need a fast computer to be able to capture at high frame rates). We recently discovered that Logitech cameras introduce some strange non-linear distortions (apparently caused by internal camera video scaling rather then optics). PSEye cameras have native resolution of 640x480 and are very accurate in raw Bayer 640x480 mode. (Please note that raw Bayer videos saved to file look like grayscale videos. This is normal. Latest version of iPi Studio fully supports PSEye raw Bayer videos).
I should note that our system is designed with the goal of producing "good looking" animation rather then an accurate recording of actor motion. Our system tries to "guess" the data when it cannot accurately "measure". This is usually OK for animators but can represent a bit of a problem if you use it for robotics design. Naturally, you cannot expect sub-$100 cameras to produce data on par with expensive specialized industrial cameras. One of the biggest sources of inaccuracies is noise in camera sensors. Noise is by far a bigger problem then camera resolution; this is a complex problem, we still have to invent a way to filter it out.
One way to improve accuracy is to use more cameras. Currently, we support up to 4 cameras. With PSEye cameras, we think we'll soon be able to support up to 6 cameras on a single PC. (Sony's PSEye cameras are much more bandwidth-efficient then all other existing USB cameras due to some clever engineering tricks.)