Next: Refinement Up: Least-Squares Fitting of a Previous: Definition of problem

Derivation of solution

We begin with the constraint that the adjusted points $\vec{y_i}$ are required to satisfy the hyperplane equation, i.e. $f(\vec{y_i}) = 0,\ i=1,\ldots, n$ where

is as defined in Equation (2). Defining residuals $V_{ik} = Y_{ik} - y_{ik}$ allows this constraint to be rewritten as

$\begin{displaymath} f(\vec{y_i}) = f(\vec{Y_i}) - \sum_{k=1}^{m-1} a_k V_{ik} + V_{im} = 0, \quad i=1,\ldots, n \end{displaymath}$

(4)

Now, given any values of the parameters

, we seek those values of $y_{ik}$ that will minimize

for that choice of the

, subject to the constraint (4). Thus we require

$\begin{displaymath} \delta S = 0 = \sum_i \sum_{k=1}^m {1 \over \sigma_{ik}^2} V_{ik} \delta V_{ik} \end{displaymath}$

(5)

From (4) we have

$\begin{displaymath} \delta f(\vec{y_i}) = 0 = - \sum_{k=1}^{m-1} a_k\, \delta V_{ik} + \delta V_{im}, \quad i=1, \ldots, n \end{displaymath}$

(6)

Multiplying each of the

equations (6) by its own undetermined multiplier $\lambda_i$ and adding them all to Equation (5), we obtain

$\begin{displaymath} \sum_i \sum_{k=1}^{m-1} ({ V_{ik} \over \sigma_{ik}^2} - \la... ...{ V_{im} \over \sigma_{im}^2} + \lambda_i) \delta V_{im} = 0 \end{displaymath}$

(7)

Since the $V_{ik}$ are independent, the coefficients of $\delta V_{ik}$ in this equation must individually be zero, giving

$\begin{displaymath} \begin{array}{rcl} V_{ik} & = & \lambda_i a_k \sigma_{ik}^2,... ... = & -\lambda_i \sigma_{im}^2, \quad i=1, \ldots, n \end{array}\end{displaymath}$

(8)

Substituting this result into Equation (4) and solving for $\lambda_i$ yields

$\begin{displaymath} \lambda_i = W_i f(\vec{Y_i}) \end{displaymath}$

(9)

where

$\begin{displaymath} W_i = \left[ \sum_{k=1}^{m-1} a_k^2 \sigma_{ik}^2 + \sigma_{im}^2 \right]^{-1} \end{displaymath}$

(10)

This allows

to be rewritten as

$\begin{displaymath} S = {\textstyle\sum_i} W_i f(\vec{Y_i})^2 \end{displaymath}$

(11)

This expresses

in the form of a weighted sum of the residuals as defined for a conventional regression, but with weights that properly take account of the individual uncertainties $\sigma_{ik}$ . Now

is to be minimized with respect to the parameters

. Setting $\partial S/ \partial a_k = 0$ leads to the following set of equations analogous to the normal equations of the conventional regression:

$\begin{displaymath} \begin{array}{l} \sum_i W_i f(\vec{Y_i}) y_{ik} = 0, \quad k=1, \ldots, m-1 [1ex] \sum_i W_i f(\vec{Y_i}) = 0 \end{array}\end{displaymath}$

(12)

Since the parameters occur in , $f(\vec{Y_i})$ , and $\vec{y_i}$ , these equations are nonlinear and cannot be solved in closed form. However, recognizing that and $\vec{y_i}$ are only weakly dependent on the parameters of the hyperplane, we can linearize Equations (12) by treating those quantities as constants. Writing out in terms of the parameters, then, we obtain

$\begin{displaymath} \begin{array}{lclcl} {\displaystyle \sum_{k=1}^{m-1}} (\sum_... ... &+& (\sum_i W_i) a_m &=& \sum_i W_i Y_{im} \\ \end{array}\end{displaymath}$

(13)

Identifying the parenthesized terms in Equation (13) as the elements of an $m \times m$ matrix

and the right-hand sides as elements of a vector $\vec{b}$ of length

, this set of equations is seen as a linear system of the form $M \vec{a}= \vec{b}$ . Solution proceeds iteratively. Starting with an initial guess for the vector of coefficients $\vec{a}$ , the matrix

and vector $\vec{b}$ are evaluated and the system $M \vec{a}= \vec{b}$ is solved for the new value of $\vec{a}$ . This new value is used to re-evaluate

and $\vec{b}$ , and the equation is solved again. The iteration is continued until convergence is obtained. In practice, it is not necessary to have a good starting guess for $\vec{a}$ . From Equation (10) it can be seen that the initial choice $\vec{a}=0$ gives, as the result of the first iteration, the same parameters as would be obtained if the $\sigma_{ik}$ were zero for all

except

. This is the same result as would be given by the conventional weighted regression of

against the other coordinates. Incidentally, this means that weighted averages (

) are computed correctly in one iteration. Experience has shown that the convergence is rapid for data sets where the fit is justified, and only a few iterations are necessary to obtain the coefficients to accuracies that are well within their uncertainties.

Subsections

Refinement

Next: Refinement Up: Least-Squares Fitting of a Previous: Definition of problem

Robert Moniot 2002-10-20