Kalman filter in computer vision: the choice of Q and R noise covariances

10,713

R is the covariance matrix of the measurement noise, assumed to be Gaussian. In the context of tracking objects in video it means your detection error. Let's say you are using a face detector to detect faces, and then you want to track them using the Kalman filter. You run the detector, you get a bounding box for each face, and then you use the Kalman filter to track the centroid of each box. The R matrix must describe how uncertain you are about the location of the centroid. So in this case for the x,y coordinates the corresponding diagonal values of R should be a few pixels. If your state includes velocity, then you need to guess the uncertainty of the velocity measurement, and take the units into account. If your position is measured in pixels and your velocity in pixels per frame, then the diagonal entries of R must reflect that.

Q is the covariance of the process noise. Simply put, Q specifies how much the actual motion of the object deviates from your assumed motion model. If you are tracking cars on a road, then the constant velocity model should be reasonably good, and the entries of Q should be small. If you are tracking people's faces, they are not likely to move with a constant velocity, so you need to crank up Q. Again, you need to be aware of the units in which your state variables are expressed.

So this is the intuition. In practice you start with some reasonable initial guess for R and Q, and then you tune them experimentally. So setting R and Q is a bit of an art. Also, in most cases using diagonal matrices for R and Q is sufficient.

Here is an example that uses the vision.KalmanFilter in Matalb for tracking multiple people.

Share:
10,713
cyberdyne
Author by

cyberdyne

Updated on July 16, 2022

Comments