Kinect - Map (x, y) pixel coordinates to "real world" coordinates using depth

18,702

Solution 1

The depth stream is correct. You should indeed take the depth value, and then from the Kinect sensor, you can easily locate the point in the real world relative to the Kinect. This is done by simple trigonometry, however you must keep in mind that the depth value is the distance from the Kinect "eye" to the point measured, so it is a diagonal of a cuboid.

Actually, follow this link How to get real world coordinates (x, y, z) from a distinct object using a Kinect

It's no use rewriting, there you have the right answer.

Solution 2

A few things:

A) I know you got the 117 degree FOV from a function in the Kinect sensor, but I still don't believe that's correct. That's a giant FOV. I actually got the same number when I ran the function on my Kinect, but I still don't believe it. While 57 (or 58.5 from some sources) seems low, it's definitely more reasonable. Try putting the Kinect on a flat surface and places object just inside its view and measure the FOV that way. Not precise, but I don't think you'll find it to be over 100 degrees.

B) I saw an article demonstrating the actual distance vs Kinect's reported depth; it's not linear. This wouldn't actually affect your 1.6 meter trig issue, but it's something to keep in mind going forward.

C) I would strongly suggest changing your code to accept the real world points from the Kinect. Better yet, just send over more data if that's possible. You can continue to provide the current data, and just tack the real world coordinate data onto that.

Share:
18,702
user1449837
Author by

user1449837

Updated on June 04, 2022

Comments

  • user1449837
    user1449837 almost 2 years

    I'm working on a project that uses the Kinect and OpenCV to export fintertip coordinates to Flash for use in games and other programs. Currently, our setup works based on color and exports fingertip points to Flash in (x, y, z) format where x and y are in Pixels and z is in Millimeters.

    But, we want map those (x, y) coordinates to "real world" values, like Millimeters, using that z depth value from within Flash.

    As I understand, the Kinect 3D depth is obtained via projecting the X-axis along the camera's horizontal, it's Y-axis along the camera's vertical, and it's Z-axis directly forward out of the camera's lens. Depth values are then the length of the perpendicular drawn from any given object to the XY-plane. See the picture in the below link (obtained from microsoft's website).

    Microsoft Depth Coordinate System Example

    Also, we know that the Kinect's horizontal field of vision is projected in a 117 degree angle.

    Using this information, I figured I could project the depth value of any given point onto the x=0, y=0 line and draw a horizontal line parallel to the XY-plane at that point, intersecting the camera's field of vision. I end up with a triangle, split in half, with a height of the depth of an object in question. I can then solve for the width of the field of view using a little trigonometry. My equation is:

    W = tan(theta / 2) * h * 2

    Where:

    • W = Field of view Width
    • theta = Horizontal field of view Angle (117 degrees)
    • h = Depth Value

    (Sorry, I can't post a picture, I would if I could)

    Now, solving for a depth value of 1000mm (1 meter), gives a value of about 3264mm.

    However, when actually LOOKING at the camera image produced I get a different value. Namely, I placed a meter stick 1 meter away from the camera and noticed that the width of the frame was at most 1.6 meters, not the estimated 3.264 meters from calculations.

    Is there something I'm missing here? Any help would be appreciated.