Mathematical Background ======================= .. contents:: Table of Contents :local: :depth: 2 Robust Least Squares Optimization by the Gauss-Newton Method ------------------------------------------------------------- Consider a least squares problem of the form: .. math:: \min_{\mathbf{p}} \frac{1}{2} \sum_j \rho\left(\| \mathbf{R}_j(\mathbf{p}) \|^2\right) where :math:`\mathbf{R}_j(\mathbf{p})` is a residual function (:math:`\mathbb{R}^{n} \rightarrow \mathbb{R}`) depending on the parameters :math:`\mathbf{p} \in \mathbb{R}^{n}`, and :math:`\rho` is a robust cost function (:math:`\mathbb{R} \rightarrow \mathbb{R}`) that reduces the influence of outliers. At each iteration of the Gauss-Newton method, we search for an update :math:`\Delta p` to the current parameters :math:`\mathbf{p_k}` that minimizes the robust cost function. Developing the residuals around the current parameters :math:`\mathbf{p_k}`: .. math:: \mathbf{R}_j(\mathbf{p} + \Delta p) \approx \mathbf{R}_j + \mathbf{J}_j \Delta p + 0.5 \Delta p^T \mathbf{H}_j \Delta p + \ldots where :math:`\mathbf{J}_j(\mathbf{p_k}) = \nabla \mathbf{R}_j(\mathbf{p_k})` is the Jacobian of the residuals with respect to the parameters, and :math:`\mathbf{H}_j(\mathbf{p_k})` is the Hessian of the residuals. Let :math:`Z_j = \| \mathbf{R}_j(\mathbf{p_k}) \|^2` be the squared norm of the residuals at the current parameters. The robust cost function can be developed around :math:`Z_j`: .. math:: \rho(Z_j + \delta Z_j) \approx \rho(Z_j) + \rho'(Z_j)\delta Z_j + 0.5 \rho''(Z_j) \delta Z_j^2 + \ldots where :math:`\delta Z_j` is the change in the squared norm due to the parameter update: .. math:: \delta Z_j = \| \mathbf{R}_j(\mathbf{p_k} + \Delta p) \|^2 - \| \mathbf{R}_j(\mathbf{p_k}) \|^2 By only keeping up to the second order terms, we can approximate :math:`\delta Z_j` as: .. math:: \delta Z_j \approx \Delta p^T \left( \mathbf{J}_j^T \mathbf{J}_j + \mathbf{R}_j^T \mathbf{H}_j \right) \Delta p + 2 \mathbf{R}_j^T \mathbf{J}_j \Delta p + ... In a similar way, the squared term :math:`\delta Z_j^2` can be approximated as: .. math:: \delta Z_j^2 \approx 4 \left( \mathbf{R}_j^T \mathbf{J}_j \Delta p \right)^2 + ... Thus, by injecting all the approximations into the robust cost function, we optain : .. math:: \rho(Z_j + \delta Z_j) \approx \rho(Z_j) + \rho'(Z_j) \Big[ \Delta p^T \left( \mathbf{J}_j^T \mathbf{J}_j + \mathbf{R}_j^T \mathbf{H}_j \right) \Delta p + 2 \mathbf{R}_j^T \mathbf{J}_j \Delta p \Big] + 2 \rho''(Z_j) \left( \mathbf{R}_j^T \mathbf{J}_j \Delta p \right)^2 + ... With Gauss-Newton, we shearch for the update :math:`\Delta p` that minimizes the robust cost function, which is equivalent to solving the zero of the gradient of the robust cost function with respect to the parameters. .. math:: \nabla_{\Delta p} \rho(Z_j + \delta Z_j) = 0 The Gauss-Newton update can be obtained by solving the following linear system: .. math:: H \, \Delta p = - g Where :math:`g` is the gradient of the robust cost function with respect to the parameters, and :math:`H` is the Hessian approximation of the robust cost function with respect to the parameters. By summing over all residuals, the gradient and Hessian approximation are: .. math:: g \approx \sum_j 2 \rho'(Z_j) \mathbf{J}_j^T \mathbf{R}_j .. math:: H \approx \sum_j \Big[2 \rho'(Z_j) \left( \mathbf{J}_j^T \mathbf{J}_j + \mathbf{R}_j^T \mathbf{H}_j \right) + 4 \rho''(Z_j) \mathbf{J}_j^T \mathbf{R}_j \mathbf{R}_j^T \mathbf{J}_j \Big] Finally, the Gauss-Newton suggested to ignore the second order term of the residuals, as at convergence, the residuals should be small, and thus the second order term :math:`\mathbf{R}_j^T \mathbf{H}_j` should be negligible compared to the first order term :math:`\mathbf{J}_j^T \mathbf{J}_j`. .. math:: H \approx \sum_j \Big[2 \rho'(Z_j) \mathbf{J}_j^T \mathbf{J}_j + 4 \rho''(Z_j) \mathbf{J}_j^T \mathbf{R}_j \mathbf{R}_j^T \mathbf{J}_j \Big] .. math:: H \approx \sum_j 2 \mathbf{J}_j^T \Big[ \rho'(Z_j) + 2 \rho''(Z_j) Z_j \Big] \mathbf{J}_j Finally, we can build a diagonal matrix :math:`W_J` and a vector :math:`W_R` that depend on the robust cost function and the squared norm of the residuals to solve the complete linear system as: .. math:: \mathbf{J}^T \sqrt{W_J}^T \sqrt{W_J} \mathbf{J} \Delta p = - W_R \mathbf{J}^T \mathbf{R} .. math:: W_J = \text{diag}\left(\rho'(|\mathbf{R}_j|^2) + 2 \rho''(|\mathbf{R}_j|^2) |\mathbf{R}_j|^2\right) .. math:: W_R = \text{diag}\left(\rho'(|\mathbf{R}_j|^2)\right) Wo observe that the system is similar to solve: .. math:: \tilde{\mathbf{J}}^T \tilde{\mathbf{J}} \Delta p = - \tilde{\mathbf{J}}^T \tilde{\mathbf{R}} With a modified Jacobian :math:`\tilde{\mathbf{J}} = \sqrt{W_J} \mathbf{J}` and a modified residual :math:`\tilde{\mathbf{R}} = \frac{W_R}{\sqrt{W_J}} \mathbf{R}`. Adding some regularization to the Gauss-Newton update ------------------------------------------------------ Sometimes, we may also have regularization terms. In this case the problem can be written as a sum of a sub least squares problem: .. math:: \min_{\mathbf{p}} \frac{1}{2} \sum_{i} w_i \sum_j \rho_i\left(\| \mathbf{R}_{i,j}(\mathbf{p}) \|^2\right) Where :math:`w_i` is a weight for each sub least squares problem, and :math:`\rho_i` is a robust cost function for each sub least squares problem. .. note:: By nomenclature, we assume that the first sub least squares problem (i.e. :math:`i=0`) is the main least squares problem containing the data residuals, and the other sub least squares problems (i.e. :math:`i \geq 1`) are regularization terms. In this case, the Gauss-Newton update can be written as: .. math:: \sum_{i} w_i \tilde{\mathbf{J}}_i(\mathbf{p_k})^T \tilde{\mathbf{J}}_i(\mathbf{p_k}) \Delta p = -\sum_{i} w_i \tilde{\mathbf{J}}_i(\mathbf{p_k})^T \tilde{\mathbf{R}}_i(\mathbf{p_k})