next up previous
Next: Refinement Up: Least-Squares Fitting of a Previous: Definition of problem

Derivation of solution

We begin with the constraint that the adjusted points $\vec{y_i}$ are required to satisfy the hyperplane equation, i.e. $f(\vec{y_i}) = 0,\
i=1,\ldots, n$ where $f$ is as defined in Equation (2). Defining residuals $V_{ik} = Y_{ik} - y_{ik}$ allows this constraint to be rewritten as
\begin{displaymath}
f(\vec{y_i}) = f(\vec{Y_i}) - \sum_{k=1}^{m-1} a_k V_{ik} + V_{im} = 0,
\quad i=1,\ldots, n
\end{displaymath} (4)

Now, given any values of the parameters $a_k$, we seek those values of $y_{ik}$ that will minimize $S$ for that choice of the $a_k$, subject to the constraint (4). Thus we require
\begin{displaymath}
\delta S = 0 = \sum_i \sum_{k=1}^m {1 \over \sigma_{ik}^2} V_{ik}
 \delta V_{ik}
\end{displaymath} (5)

From (4) we have
\begin{displaymath}
\delta f(\vec{y_i}) = 0 = - \sum_{k=1}^{m-1} a_k\, \delta V_{ik} +
\delta V_{im}, \quad i=1, \ldots, n
\end{displaymath} (6)

Multiplying each of the $n$ equations (6) by its own undetermined multiplier $\lambda_i$ and adding them all to Equation (5), we obtain
\begin{displaymath}
\sum_i \sum_{k=1}^{m-1} ({ V_{ik} \over \sigma_{ik}^2} - \la...
...{ V_{im} \over \sigma_{im}^2} +
\lambda_i)  \delta V_{im} = 0
\end{displaymath} (7)

Since the $V_{ik}$ are independent, the coefficients of $\delta
V_{ik}$ in this equation must individually be zero, giving
\begin{displaymath}
\begin{array}{rcl}
V_{ik} & = & \lambda_i a_k \sigma_{ik}^2,...
... = & -\lambda_i \sigma_{im}^2, \quad i=1, \ldots, n
\end{array}\end{displaymath} (8)

Substituting this result into Equation (4) and solving for $\lambda_i$ yields
\begin{displaymath}
\lambda_i = W_i f(\vec{Y_i})
\end{displaymath} (9)

where
\begin{displaymath}
W_i = \left[ \sum_{k=1}^{m-1} a_k^2 \sigma_{ik}^2 + \sigma_{im}^2
\right]^{-1}
\end{displaymath} (10)

This allows $S$ to be rewritten as
\begin{displaymath}
S = {\textstyle\sum_i} W_i f(\vec{Y_i})^2
\end{displaymath} (11)

This expresses $S$ in the form of a weighted sum of the residuals as defined for a conventional regression, but with weights that properly take account of the individual uncertainties $\sigma_{ik}$. Now $S$ is to be minimized with respect to the parameters $a_k$. Setting $\partial S/ \partial a_k = 0$ leads to the following set of equations analogous to the normal equations of the conventional regression:
\begin{displaymath}
\begin{array}{l}
\sum_i W_i f(\vec{Y_i})  y_{ik} = 0, \quad k=1, \ldots, m-1  [1ex]
\sum_i W_i f(\vec{Y_i}) = 0
\end{array}\end{displaymath} (12)

Since the parameters $a_k$ occur in $W_i$, $f(\vec{Y_i})$, and $\vec{y_i}$, these equations are nonlinear and cannot be solved in closed form. However, recognizing that $W_i$ and $\vec{y_i}$ are only weakly dependent on the parameters of the hyperplane, we can linearize Equations (12) by treating those quantities as constants. Writing out $f$ in terms of the parameters, then, we obtain

\begin{displaymath}
\begin{array}{lclcl}
{\displaystyle \sum_{k=1}^{m-1}} (\sum_...
...
&+& (\sum_i W_i)  a_m
&=& \sum_i W_i Y_{im} \\
\end{array}\end{displaymath} (13)

Identifying the parenthesized terms in Equation (13) as the elements of an $m \times m$ matrix $M$ and the right-hand sides as elements of a vector $\vec{b}$ of length $m$, this set of equations is seen as a linear system of the form $M \vec{a}= \vec{b}$. Solution proceeds iteratively. Starting with an initial guess for the vector of coefficients $\vec{a}$, the matrix $M$ and vector $\vec{b}$ are evaluated and the system $M \vec{a}= \vec{b}$ is solved for the new value of $\vec{a}$. This new value is used to re-evaluate $M$ and $\vec{b}$, and the equation is solved again. The iteration is continued until convergence is obtained. In practice, it is not necessary to have a good starting guess for $\vec{a}$. From Equation (10) it can be seen that the initial choice $\vec{a}=0$ gives, as the result of the first iteration, the same parameters as would be obtained if the $\sigma_{ik}$ were zero for all $k$ except $m$. This is the same result as would be given by the conventional weighted regression of $y_m$ against the other coordinates. Incidentally, this means that weighted averages ($m=1$) are computed correctly in one iteration. Experience has shown that the convergence is rapid for data sets where the fit is justified, and only a few iterations are necessary to obtain the coefficients to accuracies that are well within their uncertainties.



Subsections
next up previous
Next: Refinement Up: Least-Squares Fitting of a Previous: Definition of problem
Robert Moniot 2002-10-20