Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

GEP475GROUPINEEDANAP

Views: 1461
Kernel: Python 3 (Anaconda)

On the manipulation of data from a CSV :

Appendix: Measuring Infiltration

- Starting with the Raw Netatmo data
  • The Netatmo data will be provided as an .csv file.

  • Create new Jupyter note-book and add the Netatmo file too the local directory.

  • Once the file is in the directory, you are ready to start using pandas.

- Reading in the file with pandas
import pandas as pd #imported as pd for convience; 'pd' for pandas is the unofficial standard
Netatmo = pd.read_csv('NetAtmo_2016.csv', parse_dates=True, index_col=1) # reading in the file, defining it as Netatmo, telling pandas to parse the dates, and setting the index to column 1

Now the file is ready to be manipulated.

- The next two cells are some useful commands
Netatmo.describe() # the .describe() command will display statistics associated with each column
Timestamp Temperature Humidity CO2 Noise Pressure
count 9.014300e+04 90143.000000 90143.000000 90137.000000 90132.000000 90143.000000
mean 1.469613e+09 22.766724 51.291637 550.204799 38.964730 1011.348933
std 7.900773e+06 1.592268 6.929620 318.321732 7.100703 4.217541
min 1.455917e+09 17.900000 27.000000 201.000000 35.000000 995.000000
25% 1.462765e+09 21.700000 49.000000 354.000000 36.000000 1008.300000
50% 1.469657e+09 22.900000 52.000000 416.000000 36.000000 1011.000000
75% 1.476459e+09 23.800000 55.000000 639.000000 38.000000 1014.100000
max 1.483257e+09 28.500000 76.000000 2777.000000 79.000000 1027.500000
Netatmo.head(3) # using the .head(n) function is a quick way to look at the column names. It displays the first few rows within the file. #'n' corresponds to the number of rows displayed
Timestamp Temperature Humidity CO2 Noise Pressure
Timezone : America/Los_Angeles
2016-02-19 13:26:00 1455917199 18.8 76 NaN NaN 1015.7
2016-02-19 13:27:00 1455917255 19.2 75 718.0 NaN 1015.7
2016-02-19 13:27:00 1455917257 19.9 73 NaN NaN 1015.7

From the .head function; we can see this file's index has the name : Timezone : America/Los_Angeles

it may be useful to have a shorter index name.

- How to change index names:
Netatmo.index.name = 'Time' # renaming index^ Netatmo.head(2)
Timestamp Temperature Humidity CO2 Noise Pressure
Time
2016-02-19 13:26:00 1455917199 18.8 76 NaN NaN 1015.7
2016-02-19 13:27:00 1455917255 19.2 75 718.0 NaN 1015.7

If you want to, you can add units to the column names:

- Changing the title of columns

The syntax looks like this --> File.rename(columns = {'Oldname':'Newname'})

Netatmo = Netatmo.rename(columns = {'Temperature':'Temp_C', 'CO2':'CO2_ppm'}) #above I renamed two columns. Temperature to Temp_C and CO2 to CO2_ppm. #You can rename just one column, or multiple at once #Notice that I've also redefined this as Netatmo Netatmo.head(1)
Timestamp Temp_C Humidity CO2_ppm Noise Pressure
Time
2016-02-19 13:26:00 1455917199 18.8 76 NaN NaN 1015.7
- Isolating Columns of interest
  • In this case isolating Temperature

Temperature = Netatmo['Temp_C']
Temperature.head()
Time 2016-02-19 13:26:00 18.8 2016-02-19 13:27:00 19.2 2016-02-19 13:27:00 19.9 2016-02-19 13:31:00 20.3 2016-02-19 13:36:00 21.2 Name: Temp_C, dtype: float64
-Creating new .csv file with only Temperature Data
Temperature.to_csv('Netatmo_2016_Temperature_Only.csv')
  • Now we have a csv file of only the temperature data. This can prove helpful for repeated manipulation of the same data.

column_names = ['Time','Temp_C'] # defining column names^ NetatmoTemp = pd.read_csv('Netatmo_2016_Temperature_Only.csv', parse_dates = True, index_col = 0, names = column_names) # reading in new Temperature csv ^ NetatmoTemp.head()
Temp_C
Time
2016-02-19 13:26:00 18.8
2016-02-19 13:27:00 19.2
2016-02-19 13:27:00 19.9
2016-02-19 13:31:00 20.3
2016-02-19 13:36:00 21.2
-Adding columns
NetatmoTemp['New_Column'] = 'New data' #'New_Column' and 'New data' are used as examples NetatmoTemp.head()
Temp_C New_Column
Time
2016-02-19 13:26:00 18.8 New data
2016-02-19 13:27:00 19.2 New data
2016-02-19 13:27:00 19.9 New data
2016-02-19 13:31:00 20.3 New data
2016-02-19 13:36:00 21.2 New data
- Adding a new column that is an mathematical opperation on another column
NetatmoTemp['Temps-100'] = NetatmoTemp['Temp_C']-100 # creating new collumn that is the values of Temp_C minus 100 NetatmoTemp.head()
Temp_C New_Column Temps-100
Time
2016-02-19 13:26:00 18.8 New data -81.2
2016-02-19 13:27:00 19.2 New data -80.8
2016-02-19 13:27:00 19.9 New data -80.1
2016-02-19 13:31:00 20.3 New data -79.7
2016-02-19 13:36:00 21.2 New data -78.8
- Deleting a column
del NetatmoTemp['New_Column'] NetatmoTemp.head()
Temp_C Temps-100
Time
2016-02-19 13:26:00 18.8 -81.2
2016-02-19 13:27:00 19.2 -80.8
2016-02-19 13:27:00 19.9 -80.1
2016-02-19 13:31:00 20.3 -79.7
2016-02-19 13:36:00 21.2 -78.8
- Making a simple graphs using the 'magic' matlpotlib
%matplotlib inline NetatmoTemp.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f556070ee10>
Image in a Jupyter notebook
%matplotlib inline NetatmoTemp['Temp_C'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f5560725b70>
Image in a Jupyter notebook