proportions of a perspective-deformed rectangle

image-processing geometry computer-vision reverseprojection projective-geometry

17,775

Solution 1

Here is my attempt at answering my question after reading the paper

Zhengyou Zhang , Li-Wei He, "Whiteboard scanning and image enhancement" http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf

I manipulated the equations for some time in SAGE, and came up with this pseudo-code in c-style:


// in case it matters: licensed under GPLv2 or later
// legend:
// sqr(x)  = x*x
// sqrt(x) = square root of x

// let m1x,m1y ... m4x,m4y be the (x,y) pixel coordinates
// of the 4 corners of the detected quadrangle
// i.e. (m1x, m1y) are the cordinates of the first corner, 
// (m2x, m2y) of the second corner and so on.
// let u0, v0 be the pixel coordinates of the principal point of the image
// for a normal camera this will be the center of the image, 
// i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
// This assumption does not hold if the image has been cropped asymmetrically

// first, transform the image so the principal point is at (0,0)
// this makes the following equations much easier
m1x = m1x - u0;
m1y = m1y - v0;
m2x = m2x - u0;
m2y = m2y - v0;
m3x = m3x - u0;
m3y = m3y - v0;
m4x = m4x - u0;
m4y = m4y - v0;


// temporary variables k2, k3
double k2 = ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x) /
            ((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) ;

double k3 = ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x) / 
            ((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) ;

// f_squared is the focal length of the camera, squared
// if k2==1 OR k3==1 then this equation is not solvable
// if the focal length is known, then this equation is not needed
// in that case assign f_squared= sqr(focal_length)
double f_squared = 
    -((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x)) / 
                      ((k3 - 1)*(k2 - 1)) ;

//The width/height ratio of the original rectangle
double whRatio = sqrt( 
    (sqr(k2 - 1) + sqr(k2*m2y - m1y)/f_squared + sqr(k2*m2x - m1x)/f_squared) /
    (sqr(k3 - 1) + sqr(k3*m3y - m1y)/f_squared + sqr(k3*m3x - m1x)/f_squared) 
) ;

// if k2==1 AND k3==1, then the focal length equation is not solvable 
// but the focal length is not needed to calculate the ratio.
// I am still trying to figure out under which circumstances k2 and k3 become 1
// but it seems to be when the rectangle is not distorted by perspective, 
// i.e. viewed straight on. Then the equation is obvious:
if (k2==1 && k3==1) whRatio = sqrt( 
    (sqr(m2y-m1y) + sqr(m2x-m1x)) / 
    (sqr(m3y-m1y) + sqr(m3x-m1x))


// After testing, I found that the above equations 
// actually give the height/width ratio of the rectangle, 
// not the width/height ratio. 
// If someone can find the error that caused this, 
// I would be most grateful.
// until then:
whRatio = 1/whRatio;

Update: here is how these equations were determined:

The following is code in SAGE. It can be accessed online at http://www.sagenb.org/home/pub/704/. (Sage is really useful in solving equations, and useable in any browser, check it out)

# CALCULATING THE ASPECT RATIO OF A RECTANGLE DISTORTED BY PERSPECTIVE

#
# BIBLIOGRAPHY:
# [zhang-single]: "Single-View Geometry of A Rectangle 
#  With Application to Whiteboard Image Rectification"
#  by Zhenggyou Zhang
#  http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf

# pixel coordinates of the 4 corners of the quadrangle (m1, m2, m3, m4)
# see [zhang-single] figure 1
m1x = var('m1x')
m1y = var('m1y')
m2x = var('m2x')
m2y = var('m2y')
m3x = var('m3x')
m3y = var('m3y')
m4x = var('m4x')
m4y = var('m4y')

# pixel coordinates of the principal point of the image
# for a normal camera this will be the center of the image, 
# i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
# This assumption does not hold if the image has been cropped asymmetrically
u0 = var('u0')
v0 = var('v0')

# pixel aspect ratio; for a normal camera pixels are square, so s=1
s = var('s')

# homogenous coordinates of the quadrangle
m1 = vector ([m1x,m1y,1])
m2 = vector ([m2x,m2y,1])
m3 = vector ([m3x,m3y,1])
m4 = vector ([m4x,m4y,1])


# the following equations are later used in calculating the the focal length 
# and the rectangle's aspect ratio.
# temporary variables: k2, k3, n2, n3

# see [zhang-single] Equation 11, 12
k2_ = m1.cross_product(m4).dot_product(m3) / m2.cross_product(m4).dot_product(m3)
k3_ = m1.cross_product(m4).dot_product(m2) / m3.cross_product(m4).dot_product(m2)
k2 = var('k2')
k3 = var('k3')

# see [zhang-single] Equation 14,16
n2 = k2 * m2 - m1
n3 = k3 * m3 - m1


# the focal length of the camera.
f = var('f')
# see [zhang-single] Equation 21
f_ = sqrt(
         -1 / (
          n2[2]*n3[2]*s^2
         ) * (
          (
           n2[0]*n3[0] - (n2[0]*n3[2]+n2[2]*n3[0])*u0 + n2[2]*n3[2]*u0^2
          )*s^2 + (
           n2[1]*n3[1] - (n2[1]*n3[2]+n2[2]*n3[1])*v0 + n2[2]*n3[2]*v0^2
          ) 
         ) 
        )


# standard pinhole camera matrix
# see [zhang-single] Equation 1
A = matrix([[f,0,u0],[0,s*f,v0],[0,0,1]])


#the width/height ratio of the original rectangle
# see [zhang-single] Equation 20
whRatio = sqrt (
               (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
               (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
              )

The simplified equations in the c-code where determined by

print "simplified equations, assuming u0=0, v0=0, s=1"
print "k2 := ", k2_
print "k3 := ", k3_
print "f  := ", f_(u0=0,v0=0,s=1)
print "whRatio := ", whRatio(u0=0,v0=0,s=1)

    simplified equations, assuming u0=0, v0=0, s=1
    k2 :=  ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y
    - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    k3 :=  ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y
    - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x)
    f  :=  sqrt(-((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x
    - m1x))/((k3 - 1)*(k2 - 1)))
    whRatio :=  sqrt(((k2 - 1)^2 + (k2*m2y - m1y)^2/f^2 + (k2*m2x -
    m1x)^2/f^2)/((k3 - 1)^2 + (k3*m3y - m1y)^2/f^2 + (k3*m3x -
    m1x)^2/f^2))

print "Everything in one equation:"
print "whRatio := ", whRatio(f=f_)(k2=k2_,k3=k3_)(u0=0,v0=0,s=1)

    Everything in one equation:
    whRatio :=  sqrt(((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) -
    1)^2)/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)^2))


# some testing:
# - choose a random rectangle, 
# - project it onto a random plane,
# - insert the corners in the above equations,
# - check if the aspect ratio is correct.

from sage.plot.plot3d.transform import rotate_arbitrary

#redundandly random rotation matrix
rand_rotMatrix = \
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5))

#random translation vector
rand_transVector = vector((uniform(-10,10),uniform(-10,10),uniform(-10,10))).transpose()

#random rectangle parameters
rand_width =uniform(0.1,10)
rand_height=uniform(0.1,10)
rand_left  =uniform(-10,10)
rand_top   =uniform(-10,10)

#random focal length and principal point
rand_f  = uniform(0.1,100)
rand_u0 = uniform(-100,100)
rand_v0 = uniform(-100,100)

# homogenous standard pinhole projection, see [zhang-single] Equation 1
hom_projection = A * rand_rotMatrix.augment(rand_transVector)

# construct a random rectangle in the plane z=0, then project it randomly 
rand_m1hom = hom_projection*vector((rand_left           ,rand_top            ,0,1)).transpose()
rand_m2hom = hom_projection*vector((rand_left           ,rand_top+rand_height,0,1)).transpose()
rand_m3hom = hom_projection*vector((rand_left+rand_width,rand_top            ,0,1)).transpose()
rand_m4hom = hom_projection*vector((rand_left+rand_width,rand_top+rand_height,0,1)).transpose()

#change type from 1x3 matrix to vector
rand_m1hom = rand_m1hom.column(0)
rand_m2hom = rand_m2hom.column(0)
rand_m3hom = rand_m3hom.column(0)
rand_m4hom = rand_m4hom.column(0)

#normalize
rand_m1hom = rand_m1hom/rand_m1hom[2]
rand_m2hom = rand_m2hom/rand_m2hom[2]
rand_m3hom = rand_m3hom/rand_m3hom[2]
rand_m4hom = rand_m4hom/rand_m4hom[2]

#substitute random values for f, u0, v0
rand_m1hom = rand_m1hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m2hom = rand_m2hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m3hom = rand_m3hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m4hom = rand_m4hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)

# printing the randomly choosen values
print "ground truth: f=", rand_f, "; ratio=", rand_width/rand_height

# substitute all the variables in the equations:
print "calculated: f= ",\
f_(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
),"; 1/ratio=", \
1/whRatio(f=f_)(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

print "k2 = ", k2_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
), "; k3 = ", k3_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

# ATTENTION: testing revealed, that the whRatio 
# is actually the height/width ratio, 
# not the width/height ratio
# This contradicts [zhang-single]
# if anyone can find the error that caused this, I'd be grateful

    ground truth: f= 72.1045134124554 ; ratio= 3.46538779959142
    calculated: f=  72.1045134125 ; 1/ratio= 3.46538779959
    k2 =  0.99114614987 ; k3 =  1.57376280159

Solution 2

Update

After reading your update, and looking at the first reference (Whiteboard scanning and image enhancement), I see where the missing point is.

The input data of the problem is a quadruple (A,B,C,D), AND the center O of the projected image. In the article, it corresponds to the assumption u0=v0=0. Adding this point, the problem becomes constrained enough to get the aspect ratio of the rectangle.

The problem is then restated as follows: Given a quadruple (A,B,C,D) in the Z=0 plane, find the eye position E(0,0,h), h>0 and a 3D plane P such that the projection of (A,B,C,D) on P is a rectangle.

Note that P is determined by E: to get a parallelogram, P must contain parallels to (EU) and (EV), where U=(AB)x(CD) and V=(AD)x(BC).

Experimentally, it seems that this problem has in general one unique solution, corresponding to a unique value of the w/h ratio of the rectangle.

alt text

Previous Post

No, you can't determine the rectangle ratio from the projection.

In the general case, a quadruple (A,B,C,D) of four non collinear points of the Z=0 plane is the projection of infinitely many rectangles, with infinitely many width/height ratios.

Consider the two vanishing points U, intersection of (AB) and (CD) and V, intersection of (AD) and (BC), and the point I, intersection of the two diagonals (AC) and (BD). To project as ABCD, a parallelogram of center I must lie on a plane containing the line parallel to (UV) through point I. On one such plane, you can find many rectangles projecting to ABCD, all with a different w/h ratio.

See these two images done with Cabri 3D. In the two cases ABCD is unchanged (on the gray Z=0 plane), and the blue plane containing the rectangle is not changed either. The green line partially hidden is the (UV) line and the visible green line is parallel to it and contains I.

alt text

Solution 3

Size isnt really needed, and neither are proportions. And knowing which side is up is kind of irrelevant considering he's using photos/scans of documents. I doubt hes going to scan the back sides of them.

"Corner intersection" is the method to correct perspective. This might be of help:

How to draw a Perspective-Correct Grid in 2D

Solution 4

On the question of why the results give h/w rather then w/h: I'm wondering if the expression of Equation 20 above is correct. Posted is:

       whRatio = sqrt (
            (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
            (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
           )

When I try to execute that with OpenCV, I get an exception. But everything works correctly when I use the following equation which to me looks more like Equation 20: But based on Equation 20, it looks like it should be:

        whRatio = sqrt (
            (n2.transpose()*A.transpose()^(-1) * A^(-1)*n2) /
            (n3.transpose()*A.transpose()^(-1) * A^(-1)*n3)
           )

Solution 5

You can determine width / height by this answer Calculating rectangle 3D coordinate with coordinate its shadow?. Assume that your rectangle rotate on intersection diagonal point calculate it width and height. But when you change distance between the assumption shadow plane to the real shadow plane proportional of rectangle is the same with calculated width / height!

View more solutions

17,775

Author by

HugoRune

Updated on June 06, 2022

Comments

HugoRune almost 2 years
Given a 2d picture of a rectangle distorted by perspective:

I know that the shape was originally a rectangle, but I do not know its original size.

If I know the pixel coordinates of the corners in this picture, how can I calculate the original proportions, i.e. the quotient ( width / height ) of the rectangle?

(background: the goal is to automatically undistort photos of rectangular documents, edge detection will probably be done with hough transform)

UPDATE:

There has been some discussion on whether it is possible at all to determine the width:height ratio with the information given. My naive thought was that it must be possible, since I can think of no way to project for example a 1:4 rectangle onto the quadrangle depicted above. The ratio appears clearly close to 1:1, so there should be a way to determine it mathematically. I have however no proof for this beyond my intuitive guess.

I have not yet fully understood the arguments presented below, but I think there must be some implicit assumption that we are missing here and that is interpreted differently.

However, after hours of searching, I have finally found some papers relevant to the problem. I am struggling to understand the math used in there, so far without success. Particularly the first paper seems to discuss exactly what I wanted to do, unfortunately without code examples and very dense math.
- Zhengyou Zhang , Li-Wei He, "Whiteboard scanning and image enhancement" http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf p.11
  
  "Because of the perspective distortion, the image of a rectangle appears to be a quadrangle. However, since we know that it is a rectangle in space, we are able to estimate both the camera’s focal length and the rectangle’s aspect ratio."
- ROBERT M. HARALICK "Determining camera parameters from the perspective projection of a rectangle" http://portal.acm.org/citation.cfm?id=87146
  
  "we show how to use the 2D perspective projection of a rectangle of unknown size and position in 3D space to determine the camera look angle parameters relative to the plans of the rectangle."
Neil N almost 15 years

Partially correct. Not only do you need to know the distance, you need to know the field of view of the camera as well. i.e. a typical 35mm camera has a view angle of 54 degrees with no zoom.
Toad almost 15 years

one would probably also need to know the rotation, since it's unclear which side is up
HugoRune almost 15 years

I do not need the width, just the proportions, i.e. the quotient (width / height). The scale is of course dependent on the distance to the observer, but as far as I can tell, the proportions are not. a 1by1 square will map to different projections than a 1by2 rectangle, correct?
Neil N almost 15 years

But he knows its a rectangle. i.e. scanned documents.
fortran almost 15 years

@Neil N So what? Maybe now the rectangles are not parallelograms and I haven't noticed...
Neil N almost 15 years

because rectangles have all 90 degree corners, which takes the possible rotations down from infinity to one (well technically two if you consider he could be looking at the back side). A huge difference.
HugoRune almost 15 years

Thanks, but I am not sure if I understand this fully: Using the information given in the linked answer, I can map the quadrangle in the picture to an arbitrary rectangle, by subdividing at the intersection of the diagonals. What I would like to do is map the quadrangle to a rectangle with the correct proportions. So a picture of a square should be mapped only to a square. I am not sure how to get the ratio of sides. Googling for "corner intersection" did not work.
HugoRune almost 15 years

that's what I was wondering. As far as I can tell, a rectangle with (width=2*height) has a different set of possible projections than a rectangle with (width=3*height). So looking at a given perspective projection, there will be an infinite number of possible rectangles, but they will all have the same ratio of width to height.
fortran almost 15 years

I think I'll have to find a counter-example
Neil N almost 15 years

If you continue to intersect down until the rectangles are smaller than pixels, from there you can measure the height and width... then you would know how big to create your destination rectangle.. then just map backwards from there.
Beta almost 15 years

Excuse me, but this doesn't look right. You appear to have moved the camera between these two cases, which will change the appearance of ABCD. Projecting onto a plane like this is only approximately correct at best, and you've broken the rules.
Eric Bainville almost 15 years

Yes, the eye is at the intersection of the red lines. You're right that the position of the camera changes between the two views. What does not change is the input of the problem: the projected ABCD.
Beta almost 15 years

Excuse me, but you're wrong. You're projecting onto the wrong plane. If I construct a 2:1 rectangle, give it position and orientation, and place the camera, do you think you can find a 3:1 rectangle that looks the same to the camera?
Eric Bainville almost 15 years

In the question as I understood it, we only have the projected rectangle as input (ABCD on the gray plane). We don't know anything about the projection, so we can assume it is defined by a point and a plane. Then the question can be restated as: do all the rectangles of the 3D space projecting into ABCD have the same w/h ratio?
HugoRune almost 15 years

I think I get it! Something like this: img39.imageshack.us/img39/4273/perspectivediagramisoskh.jpg I have to think about this a bit more, but at first glance I think that is exactly what I needed, thanks a lot! (by the way, I see that you simplified your answer a bit, but I found the original comments about the origin being the point below the camera, and assuming the camera to be at a distance of 1 very useful too)
HugoRune almost 15 years

I am trying to wrap my head around this method. Is it possible to extend it for the degenerate case, when one of the vanishing points is close to infinity, i.e. when two sides of the quadrangle are parallel or almost parallel?
Eric Bainville almost 15 years

Without moving the camera, I don't think we can project a 2:1 and a 3:1 rectangle to the same ABCD in the general case. But as I said in a previous comment, this is not the original problem, where we don't know where the camera is.
Beta almost 15 years

Yes, that image captures it. This method is actually just approximate, and doesn't work well in some extreme cases. In the exact solution, the lines to the vanishing point aren't lines, they're curves (that's right, 2-point perspective is bunk), and the math is a little harder; I'll post some graphics if I can figure out how. If the figure is almost a rectangle, it's face-on and you can just do x->tan(x). If it's almost a parallelogram with non-right-angles, it's very small and you're sunk.
Beta almost 15 years

I can't follow your reasoning. The transformation in question is NOT just a rotation, so what is the justification for your "there exists" line? (And why not just give us a counter-example?)
Neil N almost 15 years

If both height and width intersection became smaller than pixel height/width on the same iteration, then yes you would have a square. If Height took twice as many iterations as width, you have a 2:1 H:W ratio... get it?
fortran almost 15 years

If it were just a rotation I would have used a 2x2 matrix and cartesian coordinates, not a 3x3 and homogeneous coordinates. It's expected to know a little bit about perspective projections to follow the reasoning, hence the "there exists" line. A demonstration is better than a counter example, but if you want one, you can fill the values that are supposed given in the equations, solve them (doing the trick to assign new variables to the non-linear products to make them linear) and then you'll have all the counterexamples you can ever wish for.
HugoRune almost 15 years

Sorry for being dense, but I do not get it at all. Using the examples shown here: freespace.virgin.net/hugo.elias/graphics/x_persp.htm If I intersect the quadrangle ABCD into smaller and smaller similar sub-quadrangles, I will eventually get sub-quadrangles smaller than a pixel. But on which iteration that happens depends: close to the CD side, the sub-quadrangles will be smaller than the ones close to the AB side of the original quadrangle. So the value I get seems arbitrary, and I do not understand how this is related to the ratio of the undistorted rectangle.
Neil N almost 15 years

What I'm saying is the subdivided quad may have it's HEIGHT smaller than a pixel, without its WIDTH being there yet. At which point you keep going until WIDTH is also smaller than a pixel. So you would have number of iterations that took to get height (lets say 10) and however many interations that got you to width (lets say 12) which means you have a 10:12 height to width ratio.
WiiMaxx almost 11 years

as a side note you can calculate the distance if you know the original high or width of one thing in the image (Person,Car,pencil,...)
WiiMaxx almost 11 years

fortran i think you mix things up based on this the can be calculated the only unknown think will be the camera distance like @Toad already suggested which is the part we need to get the original size
Octavian over 10 years

Thank you, Hugo. You should not use == operator when working with doubles. Better if you write code like this: <code> double kk = (k3 - 1)*(k2 - 1); if (abs(kk) < 0.0001) { // there is no perspective distortion... use formula 2 } else { // there is perspective distortion... use formula 1 } </code>
Mene about 9 years

Regarding the height/width / width/height problem: I don't see how you would know that. Given only an image the objects ratio could be both, couldn't it? What is widht and what is height is usually just a convention.
Mene about 9 years

This is odd, those operations shouldn't even be defined. I don't know much about SAGE, but it seems n2 and n3 are transposed in comparison with the paper. At least I cannot put your sugestion to work in SAGE, as the operations are not defined.
Mene about 9 years

And for others trying to implement this: take care about the order of the vertices, they are not counter-clockwise, but some sort of zig-zag. Take a look in the paper.
1resu over 6 years

Can anybody state what kind of error rate is to be expected? I'm getting 0-30% which is quite big...
Elte Hupkes almost 6 years

With regards to the k2==1 or k3==1 problem, according to the paper it occurs when the image transformation is a rectangle, so you'd have the aspect ratio. In my experience it suffices if two of the line segments of the projected quadrilateral are parallel - the other two need not be, as the paper concludes. For instance if (m1 - m2) and (m4 - m3) are parallel, k2 will always be 1, leading to singularity. I've yet to figure out where the error lies with either my reasoning or the reasoning in the paper.
gd1 over 3 years

+1 A word of caution to the readers. I have implemented the solution proposed here, while checking that the equations match with those in the actual paper. I am getting runaway numerical instability as soon I get close to the situation where any two input segments are parallel. This seems to be only partially addressed by this solution (the (k2==1 && k3==1) check is not enough, not even applying tolerances). I am investigating and will provide more detail if I can, but in the meantime I urge you to be especially wary of this piece of code.
user1556435 about 3 years

Could the difference in width/heigh height/width ratio be due to a difference in coordinate system interpretation? That one assumes UV coordinates (first axis horizontal), and the other assumes matrix access similar to numpy (first axis defines rows)?
user1556435 about 3 years

@1resu regarding accuracy I get pretty good results on synthetic images, but on actual photographs I get rather varying results. This leads me to believe that lens distortion plays an important role.
user0193 almost 3 years

I am actually looking for positioning the camera in 3d world to minimize the perspective effect up-to some tolerance. Is there any paper or technique that you may want to share. Any hints will be greatly appreciated question