This notebook provides an introduction to using Python for data analysis including storing and loading data, visualising data and performing regression analysis. Also download the Data folder here https://cocalc.com/share/public_paths/9b2872cc1e75acba6ebb8d9c5a1bf4a226ea55ba.

Python/FitsMyData/Python_FitsMyData.ipynb

¹³² views
License: GPL3
ubuntu2004

Kernel: Python 3 (system-wide)

Cookies and Code

Python fits my data!

A key skill for any scientific problem solver is the analysis and visualisation of data. The data from investigations typically require analysis to determine a key measurement and the respective errors. There are typical two main types of software which we can use:

Spreadsheet approach: Excel or Origin Pro.
- Pros: graphical interface is attractive to beginners
- Cons: must naviagte menus and use left/right clicks to perform analysis, method is also not easily checkable
Code based approach:
- Pros: the analysis procedure is a clear list of commands that are reproducible, efficient use of memory,
- Cons: some users not confident with programming

Here in this session, we will introduce the code based approach via:

Useful modules for data analysis
Storing and loading data
Visualising data
Linear regression using the least-squares and chi-squares methods.

Before you start, if needed, increase the font size of this document using the magnifying icon above.

You are also free to write code here in this notebook or create your own notebook by clicki

Useful modules for data analysis

Modules are Python files containing blocks of code defining useful functions. Importing modules enable you to extend the functions available for use in your Python code. There are modules for all sorts of things such as data analysis, game development, and web programming. Commonly used modules relevant to data analysis include:

numpy: numerical python which contains functions for creating arrays of data and perfoming maths on them
pandas: data structures (like spread sheets) and data analysis tools
matplotlib.pyplot for visualising data
seaborn for visualising data
scipy.stats: statistical analysis tools
statsmodels: expands on SciPy and integrates with Pandas

These modules are typically written by the community of programmers. You can also package your own functions into a .py file and import them as a module.

Import modules

In Python we use the import function to load the contents of a module into memory.

When we import the contents of modules via

import SomeModule

the functions are then available to us via:

SomeModule.functionName()

Code creation: Import some modules

Use the cell below to import:

the numpy module
check that you can call simple numpy functions such as:
- numpy.sqrt(25)
- numpy.cos(2*numpy.pi)
- numpy.random()
- numpy.array([18, 11.1, 120, 5])
Try assigning variables to the output of these commands
- x = numpy.array([18, 11.1, 120, 5])
Try performing operations on these variables:
- a = np.sqrt(x)
- b = x**2
Try printing the different variables to screen to check the results

In [0]:

# Your code here:

We can also shorten the module name using an alias:

import SomeModule as mod

and now our funtions

mod.functionName()

We tend to use standard aliases for modules so we can all understand. For example:

numpy: np
matplotlib.pyplot: plt

Code creation: Shorten the module name using an alias

Let's try importing to a short alias name:

Try importing numpy to an alias np
Run some numpy commands using the np.functionName() sytax such as creating 1D arrays:
- x = np.array([ 2.3, 11.1, 21.55, 66])
- y = x**2

In [0]:

# Tinkering code:

Storing data

One of the most important features of numpy is it's arrays. Arrays are multidimensional data sets (mentioned briefly last session). An important distinction between arrays and lists or tuples is that maths can be performed on arrays. Hence, arrays form the backbone of any data analysis in Python.

Arrays can be visualised as rows and columns. Most users work with simple 1D arrays (i.e. a single row) or 2D arrays. For a 2D array of data, visualise horizontal rows and vertical columns. This array below has 3 rows and 4 columns containing 12 elements:

As you can see from this visual example, indexing of a 2D array requires both row and column numbers. Location of data within an array are defined using arrayname[row, column] syntax. For example, if we give the above array the name mydata:

mydata[1, 3] contains the data 7
- remember indexing starts at zero
mydata[2, 3] or indeed mydata[-1, -1] contains the data 9

Arrays can also be sliced like other data sets. For example, for the array shown above:

x = mydata[:, 1] would assign all of row values from the second column
Slicing the array in this way would give a new 1D array of x = [8, 11, 5]

Creating and manipulating these multidimensional arrays in Python is completed using numpy, for example:

In [4]:

import numpy as np

# Define some lists
row0 = [1, 8, 13, 12]
row1 = [14, 11, 2, 7]
row2 = [4, 5, 16, 9]

# Combine to an array
my2Darray = np.array([row0, row1, row2])
print(my2Darray)

Out[4]:

[[ 1  8 13 12]
 [14 11  2  7]
 [ 4  5 16  9]]

There are many different functions that can be used to create and manipulate arrays. Here are a few of our favorites:

linspace(start, end, n): 1D array from start to end with n elements
arange(start, end, step): 1D array with values increasing with value step
full([nrows, ncols], value): 2D array of rows and columns all filled with some value
zeros([nrows, ncols]): an array filled with zeros
random.random([nrows, ncols]): an array filled with random floating point numbers from 0-1

Time to tinker: Play with some numpy functions

Use the cell below to define some simply arrays using the functions described above.

for example, use linespace() to define a 1D array of values
Change the sizes, values
Check that you can build some initive understanding of arrays
Try printing different index ranges i.e. slicing

In [5]:

# Tinker with the code

a = np.linspace(0, 10, 20)
b = np.arange(0, 20, 1)
c = np.full([5, 5], 1)
d =np.zeros([3,4])


e = np.linspace(10,20,11)
f =np.arange(10,20,1)
g = np.full([5,5],2)
d = np.zeros([5,5])
print (e)

Out[5]:

[10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]

Loading and saving data to file

So far we have worked with data we created within the code. For big or existing data sets we store the data in files. These are simple text files with extensions of:

.txt: text
.csv: comma separated variables

The choice of extension is not so important, the contents can be the same text based format.

Our choice of using numpy to store our data provides a number of functions to load and save data:

loadtxt: Load data from a text file to an array
savetxt: Save an array to a text file
genfromtxt: Load data from a text file, with missing values handled as specified

Load data from a file

We are going to load a dataset of the world population (in billions) over the last 5 decades.

In [6]:

import numpy as np

data = np.loadtxt('Data/population.csv', delimiter=',')

Next we like to verify what data has been loaded into memory. For this you can use the print function:

In [7]:

print(data)

Out[7]:

[[1968.      3.55]
 [1978.      4.3 ]
 [1988.      5.14]
 [1998.      5.98]
 [2008.      5.78]
 [2018.      7.63]]

or simply use the %whos magic function to see which variables we have in memory:

In [8]:

%whos

Out[8]:

Variable     Type       Data/Info
---------------------------------
a            ndarray    20: 20 elems, type `float64`, 160 bytes
b            ndarray    20: 20 elems, type `int64`, 160 bytes
c            ndarray    5x5: 25 elems, type `int64`, 200 bytes
d            ndarray    5x5: 25 elems, type `float64`, 200 bytes
data         ndarray    6x2: 12 elems, type `float64`, 96 bytes
e            ndarray    11: 11 elems, type `float64`, 88 bytes
f            ndarray    10: 10 elems, type `int64`, 80 bytes
g            ndarray    5x5: 25 elems, type `int64`, 200 bytes
matplotlib   module     <module 'matplotlib' from<...>/matplotlib/__init__.py'>
my2Darray    ndarray    3x4: 12 elems, type `int64`, 96 bytes
np           module     <module 'numpy' from '/us<...>kages/numpy/__init__.py'>
numpy        module     <module 'numpy' from '/us<...>kages/numpy/__init__.py'>
row0         list       n=4
row1         list       n=4
row2         list       n=4
x            ndarray    4: 4 elems, type `float64`, 32 bytes
y            ndarray    4: 4 elems, type `float64`, 32 bytes

In memory, we can see that the data has been stored as a 6X2 numpy array containing 12 elements requiring 96 bytes of the computers RAM.

Since we wish to perform some analysis of the data, we may like to break it up to specific variables such as:

In [9]:

# Slice the data array to new variables
# : refers to all rows, and first (0) or second (1) column

year = data[:, 0]
population = data[:, 1]

Alternatively, you can do this when loading the data:

In [10]:

year, population = np.loadtxt('Data/population.csv', delimiter=',', unpack=True)

Notice here we need to add the flag 'unpack=True' to unpack columns to to the new variables. Again let's check that the data is as expected:

In [11]:

print("Values for year: ", year)
print("Values for population:", population)

Out[11]:

Values for year:  [1968. 1978. 1988. 1998. 2008. 2018.]
Values for population: [3.55 4.3  5.14 5.98 5.78 7.63]

Time to tinker:

Try loading the data from any of the other data files in the same folder as population.csv.

In [12]:

Out[12]:

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

Saving data to a file

When using numpy, we can save data with the savetxt function. Here we have specified saving the data to 5 decimal places i.e it's precision. You can find more information about the syntax of the formating fmt argument here.

In [14]:

import numpy as np

data = [2.5, 2.5, 2.6, 2.6, 2.6, 2.4, 2.6, 2.6]

np.savetxt('mydata.txt', data, fmt='%.5f')

In [0]:

Saving data as columns

It is also possible to specify a header line to explain the data or what delimiter to use. For example:

In [16]:

np.savetxt('mydata.txt', np.c_[data], fmt='%.5f')

Saving data with a header line

It is also possible to specify a header line to explain the data or what delimiter to use. For example:

In [15]:

import numpy as np

xdata = [1, 2, 3, 4, 5]
ydata = [1.5, 6.5, 9.6, 12.6, 15.6]

np.savetxt('mydata.txt', np.c_[xdata, ydata], fmt='%4.5f', header = 'Measurements of car speed\ntime [s] speed [m/s]', delimiter = ',')

Here, we have made some x, y data and stored it as 2 columns using np.c_[] with a useful header. The \n is code for "new line" when defining a string.

Once we have our filename defined, check if it exists, and if so what it contains. To do so, use a file manager to navigate to it and open the file. On CoCalc clicking on Files in the top left will take you to the file manager. As the file extention states, .txt, this should be a simple text file.

# Measurements of car speed
# time [s] speed [m/s]
1.00000,1.50000
2.00000,6.50000
3.00000,9.60000
4.00000,12.60000
5.00000,15.60000

Time to tinker: Save data sets

Tinker with the savedata function and its options. Try

different data sets, numbers of columns
different headers
different precision
different delimiters

Refer to the function desciption here if needed.

In [0]:

xdata = [1, 2, 3, 4, 5]
ydata = [1.5, 6.5, 9.6, 12.6, 15.6]

g = np.savetxt('myyydata.txt',np.c_[xdata,ydata], header = ['Coo\year money'], fmt = '%4.1f')

Block 5: Visualising data

As mentioned earlier, there are a couple of options for visualising data in Python. The standard method is to use the matplotlib library of modules.

A key module from this library is pyplot:

In [0]:

import matplotlib.pyplot as plt

Now that we have imported this module, we can avail of the numerous plotting function.

The two main plotting functions are:

plt.plot()
- plot data as points or line, specify colours, add a label
plt.errorbar( )
- plot data with errobars

There are many functions to format plots including:

plt.xlabel()
plt.ylabel()
- label axis with some string
plt.xlim()
plt.ylim():
- limit the axis range with [lower, upper] values
legend(): add a legend using the plot labels
savefig(): save the plot to a file

There are many sources online to describe the wider functions of matplotlib.pyplot, here is a good one to start with.

Code creation: Creating and visualising data

Create an array combining the following data in three rows:
- time = [0, 1, 2, 3, 4]
- distance = [0, 3, 6, 9, 12]
- speed = [3, 3, 3, 3, 3]
Visualise distance vs time data
Visualise speed vs time data
now tinker with the appearance of the plots (colours, symbols etc)

In [0]:

# Your code here

time = [0, 1, 2, 3, 4]
distance = [0, 3, 6, 9, 12]
speed = [3, 3, 3, 3, 3]

plt.plot(time,distance)
plt.plot(time,speed)
plt.xlabel('xlabel')

Block 6: Least squares analysis

In the least squares analysis method, we attempt to fit a linear function y = mx + c to our data points $x_i\,y_i$ . For this linear function we need to determine the value of the slope, $m$ , and the intercept $c$ . These are the fit parameters.

Figure: A least squares fit to arbitary data. The black data points are the

x_i\,y_i

data and the red line is the fit. The residuals are shown as vertical lines representing the difference between data points and the fit predictions.

For any fit there will be a residual difference, $r_i = y_i - y$ , between the fit value ( $y$ ) and our actual data values ( $y_i$ ). As the name suggests the least squares algorithm trys to minimise the sum of the squares of the residuals :

S = \Sigma [y_i - y]^2

here the sigma $\Sigma$ symbol refers to the sum. When this sum of the squares of the residuals, $S$ , is the lowest value possible we have the best fit.

At the best fit, the values of slope, m, and the intercept c are calculated using:

m = \large{\large{\frac{S_{x}S_{y}-NS_{xy}}{S^{2}_{x}-NS_{xx}}}}

c = \large{\large{\frac{S_{y}-mS_{x}}{N}}}

where N is the number of data points and

\large{\large{S_{x}=\sum_{n=1}^{N}x_{i} }}

\large{\large{S_{y}=\sum_{n=1}^{N}y_{i} }}

\large{\large{S_{xy}=\sum_{n=1}^{N}x_{i}y_{i} }}

\large{\large{S_{xx}=\sum_{n=1}^{N}x_{i}^{2} }}

Code creation: Write a least squares function

You are to write a function that performs a least squares analysis.

The function should take arguments

x_i\,y_i

and return values for the slope and intercept. - Test out your function by applying it to the some of the previous data.

In [0]:

def leastsquares(x, y):
    '''
    Perform a linear regression
    Arguments:
    x, y: data points
    Return:
    m: slope
    c: intercept
    '''
    # Put Code here:
    
    
    return(m, c)

Using SciPy

Rather than writing your own analysis functions, you can simply use the ready-made functions in Python. Since the analysis is statistical, many of these functions can be found in the scipy.stats module.

A function for linear regression is linregress. We typically import just this function rather than the whole module:

In [0]:

from scipy.stats import linregress

Once we have the function imported, we can do a quick test:

In [0]:

# Perform a least squares regression
x = [2.1, 4.0, 6.4]
y = [3.1, 7.8, 11.3]

slope, intercept, r_value, p_value, err = linregress(x, y)

As you can see linregress returns 5 variables. The slope and the error are:

In [0]:

print("slope:",slope,"+/-", err)

the intercept is:

In [0]:

print("intercept:",intercept)

and the $r^2$ value is:

In [0]:

# Correlation coefficient 
print("r-squared:", r_value**2)

The latter is referred to as the goodness of fit, and $r^2=1$ for a perfect linear correlation.

In [0]:

# Define non uniform y-errors as an example
temp = [   0.,   25.,   50.,   75.,  100.,  125.,  150.,  175.,  200.,  225.,  250.]
length = [ 1.1155,  1.1164,  1.117 ,  1.1172,  1.118 ,  1.119 ,  1.1199, 1.121 ,  1.1213,  1.1223,  1.1223]
error = [0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0015, 0.0012, 0.0011, 0.0011]
np.savetxt('Data/test.txt', np.c_[temp, length, error], fmt = '%.4f')

Code creation: use the `linregress` function

Perform an analysis of the population data in `population.csv` using `linregress`.

print to screen the slope with the error and units
also check the $r^2$ value
- is the data linear?

In [0]:

# Code here
a = np.loadtxt('Data/population.csv', delimiter=',', unpack=True)
b = a[:,0]
c = a[:,1]
slope, intercept, r_value, p_value, err = linregress(b,c)
print("slope",slope,"+-",err)
print("r square",r_value**2)

In [0]:

a = np.loadtxt('Data/LengthTempData.txt', unpack=True)
b = a[:,0]
c = a[:,1]
slope, intercept, r_value, p_value, err = linregress(b,c)
print("slope",slope,"+-",err)
print("r square",r_value**2)

Code creation: use the `linregress` function some more

Perform an analysis of the thermal expansion data in `LengthTempData.txt` using `linregress`.

print to screen the slope with the error and units
also check the $r^2$ value
- is the data linear?

Block 7: Chi-squared analysis

Including errorbars in the analysis

You may have noticed that the least squares method does not depend on the uncertainties (or error bars) on each point. It simply minimises the sum of the squares between the data point $y_i$ and the fit point $y$ : $\sum\limits_{i=1}^N (y_i - y)^2$ As you can see there is no information used relating to the errorbars, $\sigma_i$ , on each data point. In fact it assumes that the errors on data points are all the same and therefore play no role in the fitting procedure.

To take non-uniform error bars on points into account we must use a chi-squared ( $\chi^2$ ) approach. This is similar to least squares but minimises: $\chi^2 = \sum\limits_{i=1}^N (\frac{y_i - y}{\sigma_i} )^2$ This type of fit is weighted with the errors on each point ( $\sigma_i$ ). We note that the bigger that point's error is, the smaller is its contribution to the chi squared weighting. Hence this enables you to propagate uncertainties all the way to the fitted parameters and hence your final measurement (e.g. derived from the slope).

For a quick example, we first define different error values for some data set. Here we will use the thermal expansion of a length of metal with increaing temperature as found in Data/LengthTempData.txt:

In [0]:

# Define non uniform y-errors as an example
temp = [   0.,   25.,   50.,   75.,  100.,  125.,  150.,  175.,  200.,  225.,  250.] # [degrees Celsius]
length = [ 1.1155,  1.1164,  1.117 ,  1.1172,  1.118 ,  1.119 ,  1.1199, 1.121 ,  1.1213,  1.1223,  1.1223] # [metres]
error = [0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0015, 0.0012, 0.0011, 0.0011] # [metres]

Code creation: Visualise raw data with errorbars

Before we perform the analysis, it may be useful to visualise the data.

use plt.errorbar() function
ensure that the data is plotted just as points rather than as a line
label the axes using plt.xlabel() and plt.ylabel() functions

In [0]:

# Code here
plt.errorbar()

Using the `curve_fit` function

To fit the data using a chi-squared method, we can use the curve_fit function.This is contained in the scipy.optimise module.

Code creation: import the `curve_fit function`

The curve_fit funciton is contained in the scipy.optimize module.

import the function

In [0]:

import scipy.optimize

The syntax for using curve_fit is:


curve_fit(fitfunction, x, y, sigma=errors)

There are 4 arguments:

fitfunction: the name of some function we make that can fit the data
- this is a string
x: the independent data i.e. what we varied
- this can be a list or array of numbers
y: the dependent data i.e. what we measured
- this can be a list or array of numbers
errors: the measured uncertainty in y
- this can be a list or array of numbers

Of course, we can called these variables whatever we wish.

Now, the argument fitfunction is the name of a function we need to make. For a linear fit we define this function as $y=mx+c$

Code creation: create a fit function

Define (def) a function with some name i.e linearfit

give it inputs x, m and c
from which it returns a value mx+c.

In [0]:

# Code here
def linearfit(x,m,c):
    return mx+c

Now we have a linear function defines and we have our data, let us now perform the chi-squared analysis:


curve_fit(fitfunction, x, y, sigma=errors)

Code creation: call the `curve_fit` function

Here use the variable names we have already defined:

linearfit: name of our fit function
temp: our independent data
length: our independent data
error: uncertainties on 'length'

In [0]:

# Code here
from scipy.optimize import curve_fit
popt,pcov = curve.fit(linearfit,temp,length)

Processing the analysis results

If sucessful, calling the function will return 2 new sets of numbers:

[2.81164900e-05, 1.11547121e+00]
[ [ 1.84166422e-12, -1.47264501e-10], [-1.47264501e-10, 1.75281421e-08] ]

These are 2 arrays. The first contains the fit parameters and the second contains their variances:

slope, intercept
a 2x2 covarient matrix i.e. a measure of the uncertainty on the fit parameters (slope and intercept)

Because we will want to use these analysis results, it is best to assign new variables to the output of curve_fit.

Code creation: store the output of `curve_fit

Assign 2 new variables to the output of curve_fit.

typically names are popt and pcov

In [0]:

# Code here

The curve_fit function now outputs two variables which we have called popt and pcov. The optimised fit coefficients are stored in the popt list and their respective errors can be calculated using the values stored in the pcov array.

After calling the curve_fit function we will want to access data stored in the popt and pcov variables.

Code creation: get the analysis results

Assign variables to the optimised fit parameters stored in popt.

use the correct index to select the slope and intercept
print the results to screen with units

In [0]:

# Code here

slope = popt[?] 
intercept = popt[?] 

print(?????)

All statistical analysis results have statistical errors. If our fit is perfect, i.e. our data is very linear, these errors should be very small. The statistical measure of the error in fit values is in terms of standard deviation referred to as sigma or $\sigma$ . This represents how spread out the possible fit values are. The curve_fit function returns variances ( $=\sigma^2$ ) rather than standard deviations.

Variance measures the variation of a single random variable (like slope), whereas the covariances are a measure of how much two random variables vary together (like the slope and the intercept). These statistical variances of slope and intercept are contained in pcov. The diagonals of pcov provide the variances of the fit parameter. The other entries are the covariances.

\begin{Bmatrix} \sigma^2(x,x) & \sigma^2(x, y) \ \sigma^2(y,x) & \sigma^2(y, y) \end{Bmatrix}

We are just interested in the variances. The diagonal, $\sigma^2(x,x),\sigma^2(y, y)$ , contain the slope and intercept variances respectively:

\begin{Bmatrix} 1.84166422e-12 & -1.47264501e-10 \ -1.47264501e-10 & 1.75281421e-08 \end{Bmatrix}

We can use the variances to calculate the errors (one standard deviation i.e. 1 sigma or $1\sigma$ ) on the fit parameters. Since the variance is $=\sigma^2$ , we take the square root to determine the error sigma.

Code creation: calculate the statistical errors

Calculate the statistical errors for slope and the intercept. Use the numpy function np.sqrt() if needed.

print to screen the values of slope and the intercept together with these errors and units

In [0]:

# Code here

In the context of this data, the analysis provides the following results:

the slope is the thermal expansion coefficient:
- $\alpha = (28\pm 1)\times 10^{-6}$ m/ $^{\circ}$ C.
- so this tells us that for every degree of heating, the metal expands by $28\times10^{-6}$ m
- comparing with known values indicates that the metal may be lead!
the intercept is equivalent to the length of the metal at $0^{\circ}$ C
- which is $L_0 = 1.1155 \pm 0.0001$ m.
the relative error on $\alpha$ is $1/28$ or $4\%$

Visualising the results

Now that we have the analysis results we can finish by visualising this together with the measured data for comparison. In order to plot a fit line, we need to create the fit data.

Code creation: create the fit data

xfit: an array of x-values from $x_{min}$ to $x_{max}$
- choose a starting point $x_{min}=0$
- choose a end point just bigger than your maximum measured $x$ value
- use a numpy function such as np.linspace or np.arange to create the xfit array
yfit: and array of y-values
- calculate using linearfit function
- yfit = linearfit(xfit, slope, intercept)

In [0]:

# Code here

Now that we have the fit data, we should compare this on a graph with the measured data.

Code creation: visualise measured data and the linear fit

plot the measured data as before using plt.errorbar()
then plot the fit data using plt.plot() on the same graph
add labels with units to the axes
save the graph as a pdf or png file
open the file for viewing

In [0]:

# Code here

Consolidate code

In your own separate notebook, gather all the code you developed to perform the chi-squared or least squares analysis. Edit the code so that it is all contained in a single cell.

Test that the entire analysis programme works.

For example:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
    
def linearfit(x, m, c):
    return m*x + c

# Load the data
filename = "Data/LengthTempData.txt"
[temp, length, error] = np.loadtxt(filename, unpack=True)

# Fit the x/y data  
popt,pcov = curve_fit(linearfit, temp, length, sigma=error)

# Calcule the fit errors
sigma = np.sqrt(np.diag(pcov)) 

# The results:
slope = popt[0] 
errorslope = sigma[0]  
intercept = popt[1] 
errorintercept = sigma[1]  

# Print the results:
print('Slope, Error, Intercept, Error:')
print (slope, errorslope, intercept, errorintercept)
print()
print( 'Linear thermal expansion coefficient = %.2e +/- %.2e m/deg C' %(slope, errorslope))
    
    
xfit = np.linspace(0, 300, 100) 
yfit = slope*xfit + intercept

plt.errorbar(temp, length, fmt='o',label='Data',yerr=error)
plt.xlabel('Temp [$^{\circ}$C]')
plt.ylabel('Length [m] ')
plt.plot(xfit,yfit, 'r-', label='Linear Fit')
plt.legend(loc='best')
plt.savefig('ThermalExp.pdf')

Cookies and Code

Python fits my data!

Useful modules for data analysis

Import modules

Code creation: Import some modules

Code creation: Shorten the module name using an alias

Storing data

Time to tinker: Play with some numpy functions

Loading and saving data to file

Load data from a file

Time to tinker:

Saving data to a file

Saving data as columns

Saving data with a header line

Time to tinker: Save data sets

Block 5: Visualising data

Code creation: Creating and visualising data

Block 6: Least squares analysis

Code creation: Write a least squares function

Using SciPy

Code creation: use the `linregress` function

Code creation: use the `linregress` function some more

Block 7: Chi-squared analysis

Including errorbars in the analysis

Code creation: Visualise raw data with errorbars

Using the `curve_fit` function

Code creation: import the `curve_fit function`

Code creation: create a fit function

Code creation: call the `curve_fit` function

Processing the analysis results

Code creation: store the output of `curve_fit

Code creation: get the analysis results

Code creation: calculate the statistical errors

Visualising the results

Code creation: create the fit data

Code creation: visualise measured data and the linear fit

Consolidate code

Product

Resources

Company

Cookies and Code

Python fits my data!

Useful modules for data analysis

Import modules

Code creation: Import some modules

Code creation: Shorten the module name using an alias

Storing data

Time to tinker: Play with some numpy functions

Loading and saving data to file

Load data from a file

Time to tinker:

Saving data to a file

Saving data as columns

Saving data with a header line

Time to tinker: Save data sets

Block 5: Visualising data

Code creation: Creating and visualising data

Block 6: Least squares analysis

Code creation: Write a least squares function

Using SciPy

Code creation: use the `linregress` function

Code creation: use the `linregress` function some more

Block 7: Chi-squared analysis

Including errorbars in the analysis

Code creation: Visualise raw data with errorbars

Using the curve_fit function

Code creation: import the `curve_fit function`

Code creation: create a fit function

Code creation: call the `curve_fit` function

Processing the analysis results

Code creation: store the output of `curve_fit

Code creation: get the analysis results

Code creation: calculate the statistical errors

Visualising the results

Code creation: create the fit data

Code creation: visualise measured data and the linear fit

Consolidate code

Using the `curve_fit` function