Contact Us!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

| Download
Views: 11
Visibility: Unlisted (only visible to those who know the link)
Image: ubuntu2204
Kernel: Python 3 (system-wide)

Curve Fitting

Suppose that you want to fit a set of data points (xi,yi)(x_i,y_i), where i=1,2,,Ni = 1,2,\ldots,N, to a function that can't be linearized. For example, the function could be a second-order polynomial, f(x,a,b,c)=ax2+bx+cf(x,a,b,c)=ax^2+bx+c. There isn’t an analytical expression for finding the best-fit parameters (aa, bb, and cc in this example) as there is for linear regression with uncertainties in one dimension. The usual approach is to optimize the parameters to minimize the sum of the squares of the differences between the data and the function. How the difference are defined varies. If there are only uncertainties in the y direction, then the differences in the vertical direction (the gray lines in the figure below) are used. If there are uncertainties in both the xx and yy directions, the orthogonal (perpendicular) distances from the line (the dotted red lines in the figure below) are used.

For the case where there are only uncertainties in the y direction, if the uncertainty in yiy_i is σi\sigma_i, then the difference squared for each point is weighted by wi=1/σi2w_i=1/\sigma_i^2. If there are no uncertainties, each point is given an equal weight of one and the results should be used with caution. The function to be minimized with respect to variations in the parameters is χ2=i=1Nwi[yif(xi,a,b,c)]2. \chi^2 = \sum_{i=1}^N w_i \left[y_i - f\left(x_i,a,b,c\right)\right]^2. For the case where there are uncertainties in both xx and yy, the function to be minimized is more complicated.

The general_fit function that performs this minimization is defined in the file "fitting.py" which must be in the same drectory as the Python program. If there are only uncertainties in the yy direction, it uses the curve_fit function from the "scipy" library to find the best-fit parameters. If there are uncertainties in both xx and yy, the odr package from the "scipy" library is used.

An example of performing a fit with uncertainties in only the yy direction is shown below. The first command imports the general_fit function. Second, the function (fitfunc) to be fit is defined. The function could have more than 3 parameters. Inital guesses at the parameters are placed in the list p0p0. The name of the fitting function, arrays containing the data points (xx and yy), the initial guesses at the parameters, and an array of uncertainties (yerryerr) are sent to the general_fit function. If it succeeds, the general_fit function returns arrays with the optimal parameters and estimates of their uncertainties (in lists called poptpopt and puncpunc), the reduced chi squared (rchi2rchi2), and the degrees of freedom (dofdof).

from fitting import * import pylab as pl # Define the fitting function def fitfunc(x, a, b, c): return (a*x**2 + b*x + c) x = pl.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0]) y = pl.array([-1.78, 4.09, 8.85, 17.9, 26.1, 35.2]) yerr = pl.array([1.24, 1.46, 1.05, 1.68, 1.18, 1.56]) p0 = [1.0, 3.0, -2.0] popt, punc, rchi2, dof = general_fit(fitfunc, x, y, p0, yerr) print('optimal parameters: ', popt) print('uncertainties of parameters: ', punc)
results of general_fit: degrees of freedom = 3 reduced chi squared = 0.3960236768747382 optimal parameters: [ 0.59883841 4.47445632 -1.7310894 ] uncertainties of parameters: [0.13760793 0.69552672 0.72598714]

For this example, the optimal parameter are

a=0.59883841b=4.47445632c=1.7310894a = 0.59883841\\ b = 4.47445632\\ c = -1.7310894

and their uncertainties are

σa=0.13760793σb=0.69552672σc=0.72598714\sigma_a = 0.13760793 \\ \sigma_b = 0.69552672 \\ \sigma_c = 0.72598714

If the initial guesses at the parameters are not reasonable, the optimization may fail. It is often helpful to plot the data first to help make a good guesses at the parameters.

If your performing a fit with uncertainties in both the xx and yy direction, arrays containing the data points (xx and yy) and their uncertainties (yerryerr and xerrxerr) are sent to the general_fit function as follows:

popt, punc, rchi2, dof = general_fit(fitfunc, x, y, p0, yerr, xerr)

Note the order of the uncertainties! The uncertainty in xx is optional, so it is second. This is also consistent with the errorbar function.

Intrepeting the Results

Plotting data with error bars and a best-fit function together gives some idea of whether or not the fit is good. If the curve passes within most of the error bars, the fit is probably reasonably good. The first line below makes a list of 100 points between the minimum and maximum values of xx in the data. In the second line below, all of the parameters are sent to the fitting function at once using a pointer (using an asterisk in front of the name).

xf = pl.linspace(min(x),max(x),100) yf = fitfunc(xf,*popt) pl.figure() pl.scatter(x,y,s=15,label='Data') pl.errorbar(x, y, yerr, ls='None', capsize=2) pl.plot(xf,yf,"r-",label='Best-Fit Curve') pl.xlabel('x') pl.ylabel('y') pl.legend(loc='upper left') pl.show()
Image in a Jupyter notebook

The reduced chi squared and the degrees of freedom can also be used to judge the goodness of the fit. If NN is the number of data points and CC is the number of parameters (or constraints) in the fit, the number degrees of freedom is d=NC. d = N - C. In the example, C=3C = 3 because there three parameters in the function. The reduced chi squared is defined as χ~2=χ2d. \tilde{\chi}^{\, 2} = \frac{\chi^2}{d}. According to Taylor (p. 271), “If we obtain a value of χ~2\tilde{\chi}^{\, 2} of order one or less, then we have no reason to doubt our expected distribution; if we obtain a value of χ~2\tilde{\chi}^{\, 2} much larger than one, our expected distribution is unlikely to be correct.” For an observed value (from fitting data) of the reduced chi square (χ~o2\tilde{\chi}^{\, 2}_o), you can look up the probability of randomly getting a larger χ~2\tilde{\chi}^{\, 2} with dd degrees of freedom on the table below (from Appendix D of Taylor’s book). A typical standard is to reject a fit if Probd(χ~2χ~o2)<5%. Prob_d\left(\tilde{\chi}^{\, 2} \ge \tilde{\chi}^{\, 2}_o \right) < 5\%. In other words, if the reduced chi squared for a fit is unlikely to occur randomly, then the fit is not a good one. In the example above, six data points are fit and χ~o2=0.39\tilde{\chi}^{\, 2}_o = 0.39. Since d=63=3d = 6 - 3 = 3, the table gives Probd(χ~2χ~o2)75%, Prob_d\left(\tilde{\chi}^{\, 2} \ge \tilde{\chi}^{\, 2}_o \right) \approx 75\%, and there is no reason to reject the fit.


From Error Analysis by John Taylor

Additional Documentation