Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download

This notebook provides an introduction to using Python for data analysis including storing and loading data, visualising data and performing regression analysis. Also download the Data folder here https://cocalc.com/share/public_paths/9b2872cc1e75acba6ebb8d9c5a1bf4a226ea55ba.

132 views
License: GPL3
ubuntu2004
Kernel: Python 3 (system-wide)

Cookies and Code

Python fits my data!


A key skill for any scientific problem solver is the analysis and visualisation of data. The data from investigations typically require analysis to determine a key measurement and the respective errors. There are typical two main types of software which we can use:

  1. Spreadsheet approach: Excel or Origin Pro.

    • Pros: graphical interface is attractive to beginners

    • Cons: must naviagte menus and use left/right clicks to perform analysis, method is also not easily checkable

  2. Code based approach:

    • Pros: the analysis procedure is a clear list of commands that are reproducible, efficient use of memory,

    • Cons: some users not confident with programming

Here in this session, we will introduce the code based approach via:

  1. Useful modules for data analysis

  2. Storing and loading data

  3. Visualising data

  4. Linear regression using the least-squares and chi-squares methods.

Before you start, if needed, increase the font size of this document using the magnifying icon above.

You are also free to write code here in this notebook or create your own notebook by clicki


Useful modules for data analysis


Modules are Python files containing blocks of code defining useful functions. Importing modules enable you to extend the functions available for use in your Python code. There are modules for all sorts of things such as data analysis, game development, and web programming. Commonly used modules relevant to data analysis include:

  • numpy: numerical python which contains functions for creating arrays of data and perfoming maths on them

  • pandas: data structures (like spread sheets) and data analysis tools

  • matplotlib.pyplot for visualising data

  • seaborn for visualising data

  • scipy.stats: statistical analysis tools

  • statsmodels: expands on SciPy and integrates with Pandas

These modules are typically written by the community of programmers. You can also package your own functions into a .py file and import them as a module.

Import modules

In Python we use the import function to load the contents of a module into memory.

When we import the contents of modules via

import SomeModule

the functions are then available to us via:

SomeModule.functionName()

Code creation: Import some modules


Use the cell below to import:

  • the numpy module

  • check that you can call simple numpy functions such as:

    • numpy.sqrt(25)

    • numpy.cos(2*numpy.pi)

    • numpy.random()

    • numpy.array([18, 11.1, 120, 5])

  • Try assigning variables to the output of these commands

    • x = numpy.array([18, 11.1, 120, 5])

  • Try performing operations on these variables:

    • a = np.sqrt(x)

    • b = x**2

  • Try printing the different variables to screen to check the results


# Your code here:

We can also shorten the module name using an alias:

import SomeModule as mod

and now our funtions

mod.functionName()

We tend to use standard aliases for modules so we can all understand. For example:

  • numpy: np

  • matplotlib.pyplot: plt

Code creation: Shorten the module name using an alias


Let's try importing to a short alias name:

  • Try importing numpy to an alias np

  • Run some numpy commands using the np.functionName() sytax such as creating 1D arrays:

    • x = np.array([ 2.3, 11.1, 21.55, 66])

    • y = x**2

# Tinkering code:

Storing data


One of the most important features of numpy is it's arrays. Arrays are multidimensional data sets (mentioned briefly last session). An important distinction between arrays and lists or tuples is that maths can be performed on arrays. Hence, arrays form the backbone of any data analysis in Python.

Arrays can be visualised as rows and columns. Most users work with simple 1D arrays (i.e. a single row) or 2D arrays. For a 2D array of data, visualise horizontal rows and vertical columns. This array below has 3 rows and 4 columns containing 12 elements:

As you can see from this visual example, indexing of a 2D array requires both row and column numbers. Location of data within an array are defined using arrayname[row, column] syntax. For example, if we give the above array the name mydata:

  • mydata[1, 3] contains the data 7

    • remember indexing starts at zero

  • mydata[2, 3] or indeed mydata[-1, -1] contains the data 9

Arrays can also be sliced like other data sets. For example, for the array shown above:

  • x = mydata[:, 1] would assign all of row values from the second column

  • Slicing the array in this way would give a new 1D array of x = [8, 11, 5]

Creating and manipulating these multidimensional arrays in Python is completed using numpy, for example:

import numpy as np # Define some lists row0 = [1, 8, 13, 12] row1 = [14, 11, 2, 7] row2 = [4, 5, 16, 9] # Combine to an array my2Darray = np.array([row0, row1, row2]) print(my2Darray)
[[ 1 8 13 12] [14 11 2 7] [ 4 5 16 9]]

There are many different functions that can be used to create and manipulate arrays. Here are a few of our favorites:

  • linspace(start, end, n): 1D array from start to end with n elements

  • arange(start, end, step): 1D array with values increasing with value step

  • full([nrows, ncols], value): 2D array of rows and columns all filled with some value

  • zeros([nrows, ncols]): an array filled with zeros

  • random.random([nrows, ncols]): an array filled with random floating point numbers from 0-1


Time to tinker: Play with some numpy functions


Use the cell below to define some simply arrays using the functions described above.

  • for example, use linespace() to define a 1D array of values

  • Change the sizes, values

  • Check that you can build some initive understanding of arrays

  • Try printing different index ranges i.e. slicing

# Tinker with the code a = np.linspace(0, 10, 20) b = np.arange(0, 20, 1) c = np.full([5, 5], 1) d =np.zeros([3,4]) e = np.linspace(10,20,11) f =np.arange(10,20,1) g = np.full([5,5],2) d = np.zeros([5,5]) print (e)
[10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.]

Loading and saving data to file


So far we have worked with data we created within the code. For big or existing data sets we store the data in files. These are simple text files with extensions of:

  • .txt: text

  • .csv: comma separated variables

The choice of extension is not so important, the contents can be the same text based format.

Our choice of using numpy to store our data provides a number of functions to load and save data:

  • loadtxt: Load data from a text file to an array

  • savetxt: Save an array to a text file

  • genfromtxt: Load data from a text file, with missing values handled as specified

Load data from a file

We are going to load a dataset of the world population (in billions) over the last 5 decades.

import numpy as np data = np.loadtxt('Data/population.csv', delimiter=',')

Next we like to verify what data has been loaded into memory. For this you can use the print function:

print(data)
[[1968. 3.55] [1978. 4.3 ] [1988. 5.14] [1998. 5.98] [2008. 5.78] [2018. 7.63]]

or simply use the %whos magic function to see which variables we have in memory:

%whos
Variable Type Data/Info --------------------------------- a ndarray 20: 20 elems, type `float64`, 160 bytes b ndarray 20: 20 elems, type `int64`, 160 bytes c ndarray 5x5: 25 elems, type `int64`, 200 bytes d ndarray 5x5: 25 elems, type `float64`, 200 bytes data ndarray 6x2: 12 elems, type `float64`, 96 bytes e ndarray 11: 11 elems, type `float64`, 88 bytes f ndarray 10: 10 elems, type `int64`, 80 bytes g ndarray 5x5: 25 elems, type `int64`, 200 bytes matplotlib module <module 'matplotlib' from<...>/matplotlib/__init__.py'> my2Darray ndarray 3x4: 12 elems, type `int64`, 96 bytes np module <module 'numpy' from '/us<...>kages/numpy/__init__.py'> numpy module <module 'numpy' from '/us<...>kages/numpy/__init__.py'> row0 list n=4 row1 list n=4 row2 list n=4 x ndarray 4: 4 elems, type `float64`, 32 bytes y ndarray 4: 4 elems, type `float64`, 32 bytes

In memory, we can see that the data has been stored as a 6X2 numpy array containing 12 elements requiring 96 bytes of the computers RAM.

Since we wish to perform some analysis of the data, we may like to break it up to specific variables such as:

# Slice the data array to new variables # : refers to all rows, and first (0) or second (1) column year = data[:, 0] population = data[:, 1]

Alternatively, you can do this when loading the data:

year, population = np.loadtxt('Data/population.csv', delimiter=',', unpack=True)

Notice here we need to add the flag 'unpack=True' to unpack columns to to the new variables. Again let's check that the data is as expected:

print("Values for year: ", year) print("Values for population:", population)
Values for year: [1968. 1978. 1988. 1998. 2008. 2018.] Values for population: [3.55 4.3 5.14 5.98 5.78 7.63]

Time to tinker:


Try loading the data from any of the other data files in the same folder as population.csv.

array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])

Saving data to a file

When using numpy, we can save data with the savetxt function. Here we have specified saving the data to 5 decimal places i.e it's precision. You can find more information about the syntax of the formating fmt argument here.

import numpy as np data = [2.5, 2.5, 2.6, 2.6, 2.6, 2.4, 2.6, 2.6] np.savetxt('mydata.txt', data, fmt='%.5f')

Saving data as columns

It is also possible to specify a header line to explain the data or what delimiter to use. For example:

np.savetxt('mydata.txt', np.c_[data], fmt='%.5f')

Saving data with a header line

It is also possible to specify a header line to explain the data or what delimiter to use. For example:

import numpy as np xdata = [1, 2, 3, 4, 5] ydata = [1.5, 6.5, 9.6, 12.6, 15.6] np.savetxt('mydata.txt', np.c_[xdata, ydata], fmt='%4.5f', header = 'Measurements of car speed\ntime [s] speed [m/s]', delimiter = ',')

Here, we have made some x, y data and stored it as 2 columns using np.c_[] with a useful header. The \n is code for "new line" when defining a string.

Once we have our filename defined, check if it exists, and if so what it contains. To do so, use a file manager to navigate to it and open the file. On CoCalc clicking on Files in the top left will take you to the file manager. As the file extention states, .txt, this should be a simple text file.

# Measurements of car speed # time [s] speed [m/s] 1.00000,1.50000 2.00000,6.50000 3.00000,9.60000 4.00000,12.60000 5.00000,15.60000

Time to tinker: Save data sets


Tinker with the savedata function and its options. Try

  • different data sets, numbers of columns

  • different headers

  • different precision

  • different delimiters

Refer to the function desciption here if needed.

xdata = [1, 2, 3, 4, 5] ydata = [1.5, 6.5, 9.6, 12.6, 15.6] g = np.savetxt('myyydata.txt',np.c_[xdata,ydata], header = ['Coo\year money'], fmt = '%4.1f')

Block 5: Visualising data


As mentioned earlier, there are a couple of options for visualising data in Python. The standard method is to use the matplotlib library of modules.

A key module from this library is pyplot:

import matplotlib.pyplot as plt

Now that we have imported this module, we can avail of the numerous plotting function.

The two main plotting functions are:

  • plt.plot()

    • plot data as points or line, specify colours, add a label

  • plt.errorbar( )

    • plot data with errobars

There are many functions to format plots including:

  • plt.xlabel()

  • plt.ylabel()

    • label axis with some string

  • plt.xlim()

  • plt.ylim():

    • limit the axis range with [lower, upper] values

  • legend(): add a legend using the plot labels

  • savefig(): save the plot to a file

There are many sources online to describe the wider functions of matplotlib.pyplot, here is a good one to start with.

Code creation: Creating and visualising data


  • Create an array combining the following data in three rows:

    • time = [0, 1, 2, 3, 4]

    • distance = [0, 3, 6, 9, 12]

    • speed = [3, 3, 3, 3, 3]

  • Visualise distance vs time data

  • Visualise speed vs time data

  • now tinker with the appearance of the plots (colours, symbols etc)

# Your code here time = [0, 1, 2, 3, 4] distance = [0, 3, 6, 9, 12] speed = [3, 3, 3, 3, 3] plt.plot(time,distance) plt.plot(time,speed) plt.xlabel('xlabel')

Block 6: Least squares analysis


In the least squares analysis method, we attempt to fit a linear function y = mx + c to our data points xi yix_i\,y_i. For this linear function we need to determine the value of the slope, mm, and the intercept cc. These are the fit parameters.

Figure: A least squares fit to arbitary data. The black data points are the xi yix_i\,y_i data and the red line is the fit. The residuals are shown as vertical lines representing the difference between data points and the fit predictions.

For any fit there will be a residual difference, ri=yi−yr_i = y_i - y, between the fit value (yy) and our actual data values (yiy_i). As the name suggests the least squares algorithm trys to minimise the sum of the squares of the residuals :

S=Σ[yi−y]2S = \Sigma [y_i - y]^2

here the sigma Σ\Sigma symbol refers to the sum. When this sum of the squares of the residuals, SS, is the lowest value possible we have the best fit.

At the best fit, the values of slope, m, and the intercept c are calculated using:

m=SxSy−NSxySx2−NSxxm = \large{\large{\frac{S_{x}S_{y}-NS_{xy}}{S^{2}_{x}-NS_{xx}}}}c=Sy−mSxNc = \large{\large{\frac{S_{y}-mS_{x}}{N}}}

where N is the number of data points and

Sx=∑n=1Nxi\large{\large{S_{x}=\sum_{n=1}^{N}x_{i} }}Sy=∑n=1Nyi\large{\large{S_{y}=\sum_{n=1}^{N}y_{i} }}Sxy=∑n=1Nxiyi\large{\large{S_{xy}=\sum_{n=1}^{N}x_{i}y_{i} }}Sxx=∑n=1Nxi2\large{\large{S_{xx}=\sum_{n=1}^{N}x_{i}^{2} }}

Code creation: Write a least squares function


You are to write a function that performs a least squares analysis.

  • The function should take arguments xi yix_i\,y_i and return values for the slope and intercept. - Test out your function by applying it to the some of the previous data.
  • def leastsquares(x, y): ''' Perform a linear regression Arguments: x, y: data points Return: m: slope c: intercept ''' # Put Code here: return(m, c)

    Using SciPy

    Rather than writing your own analysis functions, you can simply use the ready-made functions in Python. Since the analysis is statistical, many of these functions can be found in the scipy.stats module.

    A function for linear regression is linregress. We typically import just this function rather than the whole module:

    from scipy.stats import linregress

    Once we have the function imported, we can do a quick test:

    # Perform a least squares regression x = [2.1, 4.0, 6.4] y = [3.1, 7.8, 11.3] slope, intercept, r_value, p_value, err = linregress(x, y)

    As you can see linregress returns 5 variables. The slope and the error are:

    print("slope:",slope,"+/-", err)

    the intercept is:

    print("intercept:",intercept)

    and the r2r^2 value is:

    # Correlation coefficient print("r-squared:", r_value**2)

    The latter is referred to as the goodness of fit, and r2=1r^2=1 for a perfect linear correlation.

    # Define non uniform y-errors as an example temp = [ 0., 25., 50., 75., 100., 125., 150., 175., 200., 225., 250.] length = [ 1.1155, 1.1164, 1.117 , 1.1172, 1.118 , 1.119 , 1.1199, 1.121 , 1.1213, 1.1223, 1.1223] error = [0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0015, 0.0012, 0.0011, 0.0011] np.savetxt('Data/test.txt', np.c_[temp, length, error], fmt = '%.4f')

    Code creation: use the `linregress` function

    Perform an analysis of the population data in `population.csv` using `linregress`.
    • print to screen the slope with the error and units

    • also check the r2r^2 value

      • is the data linear?

    # Code here a = np.loadtxt('Data/population.csv', delimiter=',', unpack=True) b = a[:,0] c = a[:,1] slope, intercept, r_value, p_value, err = linregress(b,c) print("slope",slope,"+-",err) print("r square",r_value**2)
    a = np.loadtxt('Data/LengthTempData.txt', unpack=True) b = a[:,0] c = a[:,1] slope, intercept, r_value, p_value, err = linregress(b,c) print("slope",slope,"+-",err) print("r square",r_value**2)

    Code creation: use the `linregress` function some more

    Perform an analysis of the thermal expansion data in `LengthTempData.txt` using `linregress`.
    • print to screen the slope with the error and units

    • also check the r2r^2 value

      • is the data linear?


    Block 7: Chi-squared analysis


    Including errorbars in the analysis

    You may have noticed that the least squares method does not depend on the uncertainties (or error bars) on each point. It simply minimises the sum of the squares between the data point yiy_i and the fit point yy: ∑i=1N(yi−y)2\sum\limits_{i=1}^N (y_i - y)^2 As you can see there is no information used relating to the errorbars, σi\sigma_i, on each data point. In fact it assumes that the errors on data points are all the same and therefore play no role in the fitting procedure.

    To take non-uniform error bars on points into account we must use a chi-squared (χ2\chi^2) approach. This is similar to least squares but minimises: χ2=∑i=1N(yi−yσi)2\chi^2 = \sum\limits_{i=1}^N (\frac{y_i - y}{\sigma_i} )^2 This type of fit is weighted with the errors on each point (σi\sigma_i). We note that the bigger that point's error is, the smaller is its contribution to the chi squared weighting. Hence this enables you to propagate uncertainties all the way to the fitted parameters and hence your final measurement (e.g. derived from the slope).

    For a quick example, we first define different error values for some data set. Here we will use the thermal expansion of a length of metal with increaing temperature as found in Data/LengthTempData.txt:

    # Define non uniform y-errors as an example temp = [ 0., 25., 50., 75., 100., 125., 150., 175., 200., 225., 250.] # [degrees Celsius] length = [ 1.1155, 1.1164, 1.117 , 1.1172, 1.118 , 1.119 , 1.1199, 1.121 , 1.1213, 1.1223, 1.1223] # [metres] error = [0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0015, 0.0012, 0.0011, 0.0011] # [metres]

    Code creation: Visualise raw data with errorbars

    Before we perform the analysis, it may be useful to visualise the data.
    • use plt.errorbar() function

    • ensure that the data is plotted just as points rather than as a line

    • label the axes using plt.xlabel() and plt.ylabel() functions


    # Code here plt.errorbar()

    Using the curve_fit function

    To fit the data using a chi-squared method, we can use the curve_fit function.This is contained in the scipy.optimise module.


    Code creation: import the `curve_fit function`

    The curve_fit funciton is contained in the scipy.optimize module.

    • import the function

    import scipy.optimize

    The syntax for using curve_fit is:

    curve_fit(fitfunction, x, y, sigma=errors)

    There are 4 arguments:

    1. fitfunction: the name of some function we make that can fit the data

      • this is a string

    2. x: the independent data i.e. what we varied

      • this can be a list or array of numbers

    3. y: the dependent data i.e. what we measured

      • this can be a list or array of numbers

    4. errors: the measured uncertainty in y

      • this can be a list or array of numbers

    Of course, we can called these variables whatever we wish.

    Now, the argument fitfunction is the name of a function we need to make. For a linear fit we define this function as y=mx+cy=mx+c


    Code creation: create a fit function

    Define (def) a function with some name i.e linearfit

    • give it inputs x, m and c

    • from which it returns a value mx+c.

    # Code here def linearfit(x,m,c): return mx+c

    Now we have a linear function defines and we have our data, let us now perform the chi-squared analysis:

    curve_fit(fitfunction, x, y, sigma=errors)

    Code creation: call the `curve_fit` function

    Here use the variable names we have already defined:

    • linearfit: name of our fit function

    • temp: our independent data

    • length: our independent data

    • error: uncertainties on 'length'


    # Code here from scipy.optimize import curve_fit popt,pcov = curve.fit(linearfit,temp,length)

    Processing the analysis results

    If sucessful, calling the function will return 2 new sets of numbers:

    1. [2.81164900e-05, 1.11547121e+00]

    2. [ [ 1.84166422e-12, -1.47264501e-10], [-1.47264501e-10, 1.75281421e-08] ]

    These are 2 arrays. The first contains the fit parameters and the second contains their variances:

    1. slope, intercept

    2. a 2x2 covarient matrix i.e. a measure of the uncertainty on the fit parameters (slope and intercept)

    Because we will want to use these analysis results, it is best to assign new variables to the output of curve_fit.


    Code creation: store the output of `curve_fit

    Assign 2 new variables to the output of curve_fit.

    • typically names are popt and pcov


    # Code here

    The curve_fit function now outputs two variables which we have called popt and pcov. The optimised fit coefficients are stored in the popt list and their respective errors can be calculated using the values stored in the pcov array.

    After calling the curve_fit function we will want to access data stored in the popt and pcov variables.


    Code creation: get the analysis results

    Assign variables to the optimised fit parameters stored in popt.

    • use the correct index to select the slope and intercept

    • print the results to screen with units

    # Code here slope = popt[?] intercept = popt[?] print(?????)

    All statistical analysis results have statistical errors. If our fit is perfect, i.e. our data is very linear, these errors should be very small. The statistical measure of the error in fit values is in terms of standard deviation referred to as sigma or σ\sigma. This represents how spread out the possible fit values are. The curve_fit function returns variances (=σ2=\sigma^2) rather than standard deviations.

    Variance measures the variation of a single random variable (like slope), whereas the covariances are a measure of how much two random variables vary together (like the slope and the intercept). These statistical variances of slope and intercept are contained in pcov. The diagonals of pcov provide the variances of the fit parameter. The other entries are the covariances.

    \begin{Bmatrix} \sigma^2(x,x) & \sigma^2(x, y) \ \sigma^2(y,x) & \sigma^2(y, y) \end{Bmatrix}

    We are just interested in the variances. The diagonal, σ2(x,x),σ2(y,y)\sigma^2(x,x),\sigma^2(y, y), contain the slope and intercept variances respectively:

    \begin{Bmatrix} 1.84166422e-12 & -1.47264501e-10 \ -1.47264501e-10 & 1.75281421e-08 \end{Bmatrix}

    We can use the variances to calculate the errors (one standard deviation i.e. 1 sigma or 1σ1\sigma) on the fit parameters. Since the variance is =σ2=\sigma^2, we take the square root to determine the error sigma.


    Code creation: calculate the statistical errors

    Calculate the statistical errors for slope and the intercept. Use the numpy function np.sqrt() if needed.

    • print to screen the values of slope and the intercept together with these errors and units

    # Code here

    In the context of this data, the analysis provides the following results:

    • the slope is the thermal expansion coefficient:

      • α=(28±1)×10−6\alpha = (28\pm 1)\times 10^{-6} m/∘^{\circ}C.

      • so this tells us that for every degree of heating, the metal expands by 28×10−628\times10^{-6} m

      • comparing with known values indicates that the metal may be lead!

    • the intercept is equivalent to the length of the metal at 0∘0^{\circ}C

      • which is L0=1.1155±0.0001L_0 = 1.1155 \pm 0.0001m.

    • the relative error on α\alpha is 1/281/28 or 4%4\%

    Visualising the results

    Now that we have the analysis results we can finish by visualising this together with the measured data for comparison. In order to plot a fit line, we need to create the fit data.


    Code creation: create the fit data

    • xfit: an array of x-values from xminx_{min} to xmaxx_{max}

      • choose a starting point xmin=0x_{min}=0

      • choose a end point just bigger than your maximum measured xx value

      • use a numpy function such as np.linspace or np.arange to create the xfit array

    • yfit: and array of y-values

      • calculate using linearfit function

      • yfit = linearfit(xfit, slope, intercept)

    # Code here

    Now that we have the fit data, we should compare this on a graph with the measured data.


    Code creation: visualise measured data and the linear fit

    • plot the measured data as before using plt.errorbar()

    • then plot the fit data using plt.plot() on the same graph

    • add labels with units to the axes

    • save the graph as a pdf or png file

    • open the file for viewing

    # Code here

    Consolidate code

    In your own separate notebook, gather all the code you developed to perform the chi-squared or least squares analysis. Edit the code so that it is all contained in a single cell.

    • Test that the entire analysis programme works.

    For example:

    import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit def linearfit(x, m, c): return m*x + c # Load the data filename = "Data/LengthTempData.txt" [temp, length, error] = np.loadtxt(filename, unpack=True) # Fit the x/y data popt,pcov = curve_fit(linearfit, temp, length, sigma=error) # Calcule the fit errors sigma = np.sqrt(np.diag(pcov)) # The results: slope = popt[0] errorslope = sigma[0] intercept = popt[1] errorintercept = sigma[1] # Print the results: print('Slope, Error, Intercept, Error:') print (slope, errorslope, intercept, errorintercept) print() print( 'Linear thermal expansion coefficient = %.2e +/- %.2e m/deg C' %(slope, errorslope)) xfit = np.linspace(0, 300, 100) yfit = slope*xfit + intercept plt.errorbar(temp, length, fmt='o',label='Data',yerr=error) plt.xlabel('Temp [$^{\circ}$C]') plt.ylabel('Length [m] ') plt.plot(xfit,yfit, 'r-', label='Linear Fit') plt.legend(loc='best') plt.savefig('ThermalExp.pdf')