www.eJournalnet.com

Issue 2 -  2001/02

 ISSN 1311-8978

 

AN ALGORITHM FOR KNOT DISTRIBUTION AND PLANNING
FOR SPLINE APPROXIMATION OF PLANAR CURVES

Tzvetomir I. Vassilev

Dept. of Informatics, University of Rousse, Bulgaria

Received 10.09.2002; Cited 10.11.2002

 

Abstract

Spline regression procedures are often carried out on noisy data sets when the unknown function is of disassociated nature or its shape is difficult to predict. If the data points are obtained through an experiment, one of the natural problems to solve is how to space the data, so that given an error tolerance the least number of data points is achieved. In order to find a satisfactory answer to this question, one must have some a priori information about the curve to be acquired.

This paper describes an approach to determine the spline knots and data points locations if a rough estimation of the unknown function is available. Results of applying the algorithm to several functions and comparison to other approaches are given at the end of the paper.

Keywords: Spline approximation, Planning, Spline regression

 

1. Introduction

In many computerized data acquisition systems the values of an unknown 2D curve have to be measured and then modelled by a smooth function, often a polynomial or a B-spline function. Let g(x) be the unknown function on the interval [a, b] describing the characteristic of a physical object. The goal is to experimentally obtain the curve y=g(x), so that the approximation error satisfies a tolerance constraint. In order to achieve this goal one has to solve several problems:

i)      to find an optimum plan, i.e. the  abscissas sequence {xi} (a=x1<x2<…<xn=b), for which the data will be measured. The main criterion for optimality is to minimize the error of approximation at the point of the plan where it has a maximum. In this sense the G-optimum plan is sought as defined in the classical literature [1]. The second criterion is that n should be the smallest possible integer that meets the error constraint because the experiment time is expensive;

ii)    to conduct the experiment and collect the data set {xi, yi};

iii)   to model the unknown function g(x) by an approximant f(x) applying the least-squares method and estimate the approximation error.

This paper describes a satisfactory solution of the first problem in case the unknown curve is modelled by a B-spline function. In order to find an optimum plan, it is sufficient if the function g is given only approximately. In typical practical situations this happens in one of the following ways:

·       Knowing the physical properties of the tested object and some of its parameters, the theoretical dependence y=g(x) can be calculated. Usually it defers to the experimentally obtained curve but for planning purposes it is sufficient to have the function trend not its exact values;

·       The function is given in an implicit way as a solution of a differential or functional equation;

·       Objects of the same or similar type have been tested before and their experimental curves are at disposal. Again for planning purposes this is accurate enough.

The problem for determining an optimum experiment plan for B-spline regression can be solved in two steps. First the knots of the spline function must be found and then the abscissas sequence calculated. Several solutions to the first step have been described in the literature [2-7]. Most of them [3-7] however concern knots estimation when only the noisy data set {xi, yi} is given and the function g(x) is not known in any form. A formal solution to this problem when the function g(x) or its derivatives are present is given by de Boor [2] and will be described in details later.

This paper presents an algorithm that first estimates the knots using the formula given by de Boor [2] and then computes an experiment plan, so that the error constraint is met with the least number of data points. The second section gives detailed theoretical description of the method. Section 3 presents some details about the software implementation. The results and comparison to previous work is done in section 4. The last section concludes the paper and gives some ideas of future research.

 

2. Description of the method

Let a=u1, u2, …, ul+1=b be the break point sequence of a B-spline curve of order k (k=4 for cubic splines) with l polynomial segments. The actual knot sequence t ={ti} is usually determined with the first k knots equal to a and last k knots equal to b. Carl de Boor suggested the following knot sequence that minimizes the error of approximation

j=1, …, l-1,                                                         (1)

where g(k)(x) is the k-th derivative of the function being approximated. The actual proof that the above knot sequence minimizes the error can be found in literature [2]. As noted in the introduction, some analytical representation of g(x) is available, so it is easy to evaluate the integrals in equation (1) and then the break point sequence u ={ui} . Now the data points of the plan have to be calculated. However, we will first solve the inverse problem: given the data points to calculate the knot sequence, so that the conditions of the least-squares approximation are not violated.

The original problem of least-squares approximation with B-spline curves has been solved by Carl de Boor [2]. Let Sk,t be the linear space of all spline functions of order k, defined over a specific knot sequence t. A discrete inner product for this space, which is a reasonable approximation to the continuous inner product , is given by

<g, h> = ,                                                                                           (2)

where x is the sequence of data points in the interval [a, b] and w sequence of weights. The norm induced by the inner product (2) is expressed ||g||2 = . De Boor proves that a function f* is a best approximation from the space Sk,t to an unknown function g with respect to the norm || g - f ||2 if and only if the function f* is in Sk,t and the function g-f* is orthogonal to Sk,t, i.e. for all fÎSk,t, <f, g-f*>=0. For the specific linear space of B-splines the system of normal equations is

= < Bi, g >, i = 1,...,m,                                                                          (3)

where {Bi}  is the B-spline basis and {vi}  are the B-spline control points.

However there is an additional condition that should be satisfied given by the following lemma.

Lemma 1. The norm || f ||2 = (å wi(f(xi))2)1/2 with wi>0, all i and {xi} nondecreasing, is a norm on Sk,t if and only if, for some 1£j1<…<jN£n,

ti<xji<ti+k , i = 1, …, N.

If the condition of this lemma is violated, then || f ||2 fails to be a norm on Sk,t and the normal equations (3) are singular. In other words there should be at least one data point between each knots ti and ti+k. The following algorithm for calculating the knot sequence, suggested here, satisfies the above condition:

t1 = … = tk = x1

tm+1 = … = tm+k+1 = xn

coef = (n-1)/l

ind = (i-k)*coef + 1

j = [ind]                                            , i = k+1,…, m                                            (4)

ti = xj + (ind-j)*(xj+1 - xj)

The operator [x] means truncating the integer part of real value x. If n>m, which is a necessary condition for the system of equations (3) to have a solution, it is easy to see that equations (4) guarantee that there is at least one data point in each of the intervals (ti, ti+1), i = k+1,…, m, which is even stronger than the lemma condition.

Let us now go back to the original problem: given the knot sequence to determine the data points, so that the lemma condition is not violated. To simplify the notation we will use the break points array u instead of the knots vector t. Looking at equations (4) it is obvious that there is some freedom in choosing the data points because as a rule the number of data points n is much greater than the number of polynomial segments l. The numerous experiments the author of the paper conducted showed that the following algorithm gives satisfactory results.

x1 = u1, xn = ul+1

coef = (n-1)/l, jo = 1

for i=2 to l do

begin

   ind = (i-1)*coef

   j = [ind]

   delta = (ui+1 - ui-1)/(2*coef)                                                                              (5)

   xj = ui -(ind-j)*delta

   xj+1 = xj + delta

   delta = (xj - xjo)/(j-jo)

   for p=jo+1 to j-1 do xp = xp-1 + delta

   if ind=j then jo = j else jo = j+1

end

Using algorithm (5) the required data points vector x and thus the experiment plan are computed. A positive feature of the above algorithm is that if the knots are recalculated applying equations (4) on the output vector x, the same values for the input vector u of break points will be achieved. This means that after the experiment is completed and the data set is measured, if equations (4) are built in the program for the least-squares B-spline approximation, then it is guaranteed that the best knot sequence will be achieved (or almost the best, which depends on how accurate the a priori representation of the unknown function g(x) was).

 

3. Software implementation of the method

As mentioned in the introduction an approximate representation of the examined function y=g(x) is given in some way. If the initial interval [a, b] is divided in N equal parts, then N+1 equally spaced values of the function (xi, yi)  can be calculated. Experiments conducted showed that the choice N=150 or 200 gives satisfactory results and greater values are not necessary. Once a data point sequence of g(x) is present, a sequence of its k-th derivative values can be computed. For the purposes described in this paper the divided differences give good enough approximation to the k-th derivative and there is no need to implement more accurate methods for digital differentiation.

Now we are ready to evaluate the integrals in equation (1). Several numerical methods for integration are described in [8]. The one of Gaussian quadrature gives most accurate results and if the integrated function is a polynomial, exact integrals can be evaluated. In our case, however, as g(x) is given only approximately, the classical methods for equally spaced values, e.g. trapezoidal and Simpson's rule, give satisfactory results.

After the break point sequence is calculated, applying algorithm (5) one can easily compute the data points values and thus the experiment plan has been determined. Now it is a good idea to estimate the error of the plan. This could be achieved in the following way. Having the data points and applying equations (4) on them, the knot sequence is computed. The least-squares B-spline approximation described by de Boor [2] should be implemented. The system of equations (3) can be solved using LU decomposition [8]. However, since its matrix is symmetric band and non-singular, the Cholesky factorization method [9] is more efficient in such cases. After the control points of the B-spline approximant are found, it is easy to estimate the maximum error of approximation, comparing the values of the B-spline function to those of the original function g(x).

In some cases the researchers are more used to a different optimization criterion not the maximum error but the standard deviation of approximation given by the formula:

,                                                                                         (6)

where yi and are the actual and the predicted with the model values of the function at the point xi and m is the number of coefficients in the model, m = l+k-1 for a spline-function.

In order to automate the process of finding an optimum plan, a computer program has been implemented. It reads from a text file the values of the function in a given interval. The user can specify the optimum criterion: either the maximum error or standard deviation. If the function is symmetric in regards to the interval centre, then the user can indicate whether to use only odd (or even) values for l or if this is irrelevant. Starting from the lowest values for l (i.e. the simplest model), the program computes the data points and the optimum criterion value and increases the vales of l until the constraint is met.

 

 

Fig. 1. Approximation of the Runge function (solid line) by a spline (dashed line)
with 4 polynomial segments (3 interior knots - *) and 11 data points (verticals).
The error distribution is given below.

 

4. Results

In this section some numerical results of applying the described method are given by numbers and figures. We begin with a classic function, the Runge function g(x) = 1/(1+x2) on [-5, 5]. This function is in C¥ but is nevertheless difficult to approximate. Fig. 1 shows the results of approximating the function by a cubic spline with l=4, i.e. 3 interior knots, represented by asterisks (*). The first row of Table 1 shows the same example. One can notice that the knot distribution is almost the same as that shown by Hu [2] and our approximation error (0.072) is even smaller compared to his (0.079) for the same number of knots. Besides his results are obtained for 201 equally spaced data points, while in our case algorithm (5) produced better results with only 11 data points.

 

Function

Interval

l

Number of data points

Maximum Error

Standard Deviation

1/(1+x2)

[-5, 5]

4

11

0.072

0.033

 

 

6

15

0.0065

0.0027

 

 

8

17

0.0035

0.0021

 

 

10

21

0.002

0.001

 

 

12

25

0.0008

0.0003

10x/(1+100x2)

[-2, 2]

7

15

0.032

0.007

 

 

9

19

0.006

0.002

 

 

11

23

0.004

0.001

 

 

13

27

0.0025

0.0005

 

 

17

34

0.001

0.0003

[0, 1]

3

7

0.008

0.004

 

 

5

11

0.001

0.0005

 

 

9

19

0.0005

0.0002

Table 1. Approximation error to several functions with different number of knots

 

The second example (Fig. 2 and rows 6-9 of Table 1) concerns another rational function g(x) = 10x/(1+100x2) on [-2, 2]. From Fig. 2 we can see that its curve is sharp near x = 0 and flat away. The algorithm puts more data points near zero and less in the areas where the function is flat. Again the approximation error achieved for 6 knots (0.032) is better than that shown by Hu (0.04) and this is for 15 data points, compared to 101 used by him.

The last example (Fig. 3 and rows 10-12 of Table 1) considers the irrational function g(x) =  on the interval [0, 1], which has a singularity x=0 and that is why is difficult to approximate. The approximation error shown in [2] for 2 interior knots and 61 data points is 0.0081. Our algorithm gives error almost the same error (0.008) for the same number of knots but the number of data points is much smaller (7).

 

 

Fig. 2. Approximation of the function g(x) = 10x/(1+100x2) (solid line) by a spline (dashed line)

with 7 polynomial segments (6 interior knots - *) and 15 data points (verticals). The error distribution is given below.

 

Table 1 shows the results of approximation of the three functions with different values for the number of polynomial segments l. The fourth column contains the number of data points, which is sufficient to achieve the error given in column five. Moreover, further increase of the number of data points does not affect significantly the approximation error. This allows us to draw the following conclusions that can be useful for users of the algorithm:

·       for values of l > 8, number of data points 2*l or 2*l+1 are sufficient;

·       for smaller values of l, the number of data points could depend on the function but the experiments conducted showed that 3*l is more than enough for all cases.

The above results are very important for the experiment planning, because in many cases the time of experiment is expensive and the engineer wishes to measure as few data points as possible.

 

 

Fig. 3. Approximation of the function g(x) =  (solid line) by a spline (dashed line)

with 3 polynomial segments (2 interior knots - *) and 7 data points (verticals).
The error distribution is given below.

 

We conclude this section with Table 2, which gives results of the program for automatic plan determination for an error constraint of 0.001 with two different criteria. The plans are obtained for the three functions considered in the previous figures and table. For the first function only even numbers for l were considered, for the second only odd numbers, while for the third any numbers starting from 2 were used.

 

Function

Interval

Plan

Optimum Criterion

l

Number of points

1/(1+x2)

[-5, 5]

 

 

 

 

 

10

21

Standard Deviation

 

 

12

25

Maximum Error

10x/(1+100x2)

[-2, 2]

 

 

 

 

 

11

23

Standard Deviation

 

 

17

34

Maximum Error

[0, 1]

 

 

 

 

 

4

9

Standard Deviation

 

 

5

11

Maximum Error

Table 2. Plans obtained automatically for an error constraint 0.001 with 2 different criteria

 

5. Conclusions

A computer program has been implemented that uses the algorithms described above and helps user do the following:

·       estimate the number of polynomial segments l of a spline curve that approximates a function with a pre-given error tolerance;

·       find the minimum number of data points (usually from 2*l to 3*l) that guarantees the error tolerance;

·       determine the data points distribution, i.e. the plan of experiment for a spline regression;

·       automatically determine a plan, which meets a given error constraint.

The algorithm has been developed for non-parametric functions only. With few modifications it could be extended to parametric spline functions w(u) = [x(u), y(u)], described by Farin [10]. An interesting further step would be to use the same ideas for planning the experiment in the 3D case, where spline surfaces should be modelled.

Acknowledgments

This work has been done at the University of Rousse, Bulgaria and partly at QMW, University of London, UK. The author would like to thank his supervisors and reviewers for the helpful talks and comments.

 

References

Kiefer, J. Optimum experimental designs. Journal of the Royal Statistical Society Series B – Methodological, 21, 1959, 272-319.

de Boor, C. A Practical Guide to Splines, Springer-Verlag, New York, 1978.

Hu, Y. An algorithm for data reduction using splines with free knots, IMA Journal on Numerical Analysis, 13 (3), 1993, 365-381

Mureika R. A.., D. J. Dupuis, Using Laplace transform theory to estimate knots, Communications in Statistics-Theory and Methods, 23 (10), 1994, 2775-2786

Hastie, T. Pseudo splines, Journal of the Royal Statistical Society Series B - Methodological, 58 (2), 1996,  379-396.

Schwetlick, H., T. Schuetze, Least squares approximation by splines with free knots, BIT, 35 (3), 1995, 361-384.

Stein, M. Spline smoothing with an estimated order parameter, The Annals of Statistics, 21 (3), 1993, 1522-1544.

Press, W. H., B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical recipes, The art of scientific computation, Cambridge University Press, New York, 1989, pp 121-126.

Forsythe, G., C. Moler, Computer solution of linear algebraic systems, Prentice-Hall Inc., 1967, pp 114-119.

Farin, G. Curves and Surfaces for Computer Aided Geometric Design, Academic Press, 1993

 

 

Scientific Symp