Mathematical Background#
Robust Least Squares Optimization by the Gauss-Newton Method#
Consider a least squares problem of the form:
where \(\mathbf{R}_j(\mathbf{p})\) is a residual function (\(\mathbb{R}^{n} \rightarrow \mathbb{R}\)) depending on the parameters \(\mathbf{p} \in \mathbb{R}^{n}\), and \(\rho\) is a robust cost function (\(\mathbb{R} \rightarrow \mathbb{R}\)) that reduces the influence of outliers.
At each iteration of the Gauss-Newton method, we search for an update \(\Delta p\) to the current parameters \(\mathbf{p_k}\) that minimizes the robust cost function. Developing the residuals around the current parameters \(\mathbf{p_k}\):
where \(\mathbf{J}_j(\mathbf{p_k}) = \nabla \mathbf{R}_j(\mathbf{p_k})\) is the Jacobian of the residuals with respect to the parameters, and \(\mathbf{H}_j(\mathbf{p_k})\) is the Hessian of the residuals.
Let \(Z_j = \| \mathbf{R}_j(\mathbf{p_k}) \|^2\) be the squared norm of the residuals at the current parameters. The robust cost function can be developed around \(Z_j\):
where \(\delta Z_j\) is the change in the squared norm due to the parameter update:
By only keeping up to the second order terms, we can approximate \(\delta Z_j\) as:
In a similar way, the squared term \(\delta Z_j^2\) can be approximated as:
Thus, by injecting all the approximations into the robust cost function, we optain :
With Gauss-Newton, we shearch for the update \(\Delta p\) that minimizes the robust cost function, which is equivalent to solving the zero of the gradient of the robust cost function with respect to the parameters.
The Gauss-Newton update can be obtained by solving the following linear system:
Where \(g\) is the gradient of the robust cost function with respect to the parameters, and \(H\) is the Hessian approximation of the robust cost function with respect to the parameters.
By summing over all residuals, the gradient and Hessian approximation are:
Finally, the Gauss-Newton suggested to ignore the second order term of the residuals, as at convergence, the residuals should be small, and thus the second order term \(\mathbf{R}_j^T \mathbf{H}_j\) should be negligible compared to the first order term \(\mathbf{J}_j^T \mathbf{J}_j\).
Finally, we can build a diagonal matrix \(W_J\) and a vector \(W_R\) that depend on the robust cost function and the squared norm of the residuals to solve the complete linear system as:
Wo observe that the system is similar to solve:
With a modified Jacobian \(\tilde{\mathbf{J}} = \sqrt{W_J} \mathbf{J}\) and a modified residual \(\tilde{\mathbf{R}} = \frac{W_R}{\sqrt{W_J}} \mathbf{R}\).
Adding some regularization to the Gauss-Newton update#
Sometimes, we may also have regularization terms. In this case the problem can be written as a sum of a sub least squares problem:
Where \(w_i\) is a weight for each sub least squares problem, and \(\rho_i\) is a robust cost function for each sub least squares problem.
Note
By nomenclature, we assume that the first sub least squares problem (i.e. \(i=0\)) is the main least squares problem containing the data residuals, and the other sub least squares problems (i.e. \(i \geq 1\)) are regularization terms.
In this case, the Gauss-Newton update can be written as: