Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

GEP475GROUPINEEDANAP

Views: 1461
Kernel: Python 3 (Anaconda)

Creating New "CO2 only" csv files, from orginal Netatmo csvs

  1. making sure both files have same index and same column name for CO2

import numpy as np import pandas as pd
year1 = pd.read_csv('../DataFiles/Raw/NetAtmo_2016.csv', parse_dates=True, index_col=1) year1.head()
Timestamp Temperature Humidity CO2 Noise Pressure
Timezone : America/Los_Angeles
2016-02-19 13:26:00 1455917199 18.8 76 NaN NaN 1015.7
2016-02-19 13:27:00 1455917255 19.2 75 718.0 NaN 1015.7
2016-02-19 13:27:00 1455917257 19.9 73 NaN NaN 1015.7
2016-02-19 13:31:00 1455917513 20.3 73 337.0 44.0 1015.8
2016-02-19 13:36:00 1455917814 21.2 70 332.0 47.0 1015.7
year1.describe()
Timestamp Temperature Humidity CO2 Noise Pressure
count 9.014300e+04 90143.000000 90143.000000 90137.000000 90132.000000 90143.000000
mean 1.469613e+09 22.766724 51.291637 550.204799 38.964730 1011.348933
std 7.900773e+06 1.592268 6.929620 318.321732 7.100703 4.217541
min 1.455917e+09 17.900000 27.000000 201.000000 35.000000 995.000000
25% 1.462765e+09 21.700000 49.000000 354.000000 36.000000 1008.300000
50% 1.469657e+09 22.900000 52.000000 416.000000 36.000000 1011.000000
75% 1.476459e+09 23.800000 55.000000 639.000000 38.000000 1014.100000
max 1.483257e+09 28.500000 76.000000 2777.000000 79.000000 1027.500000
year2 = pd.read_csv('NetAtmo_2017.csv', parse_dates=True, index_col=1) year2.head()
Timestamp Temperature Humidity CO2 Noise Pressure
Time
2017-01-01 00:00:00 1483257658 21.8 34 482 39 1009.1
2017-01-01 00:05:00 1483257959 21.8 34 491 41 1009.2
2017-01-01 00:11:00 1483258260 21.9 34 480 39 1009.2
2017-01-01 00:16:00 1483258562 21.9 34 486 39 1009.2
2017-01-01 00:21:00 1483258864 21.9 34 490 37 1009.3

above you can see, the 2016 csv (year1) did not have the same index as the 2017 file.

How to change:
year1.index.names = ['Time'] year1.head()
Timestamp Temperature Humidity CO2 Noise Pressure
Time
2016-02-19 13:26:00 1455917199 18.8 76 NaN NaN 1015.7
2016-02-19 13:27:00 1455917255 19.2 75 718.0 NaN 1015.7
2016-02-19 13:27:00 1455917257 19.9 73 NaN NaN 1015.7
2016-02-19 13:31:00 1455917513 20.3 73 337.0 44.0 1015.8
2016-02-19 13:36:00 1455917814 21.2 70 332.0 47.0 1015.7

Now that we have the same index, we'll go ahead and create the new files.

  • isolating the CO2 data

firstyear = year1['CO2'] #firstyear.head()
secondyear = year2['CO2'] #secondyear.head()

using to_csv

firstyear.to_csv('Netatmo2016CO2ONLY.csv')
column_names = ['Time','ppm'] year1CO2 = pd.read_csv('Netatmo2016CO2ONLY.csv', parse_dates=True, index_col=0, names=column_names) year1CO2.head()
ppm
Time
2016-02-19 13:26:00 NaN
2016-02-19 13:27:00 718.0
2016-02-19 13:27:00 NaN
2016-02-19 13:31:00 337.0
2016-02-19 13:36:00 332.0
#year1CO2.describe()
secondyear.to_csv('Netatmo2017CO2ONLY.csv')
column_names = ['Time','ppm'] year2CO2= pd.read_csv('Netatmo2017CO2ONLY.csv', parse_dates=True, index_col=0, names=column_names) year2CO2.head()
ppm
Time
2017-01-01 00:00:00 482
2017-01-01 00:05:00 491
2017-01-01 00:11:00 480
2017-01-01 00:16:00 486
2017-01-01 00:21:00 490

Next, I want to create another csv file that has the column names already inside. (I dont know a better way)*

year1CO2.to_csv('2016CO2ONLY.csv') year1 = pd.read_csv('2016CO2ONLY.csv', parse_dates=True, index_col=0,) #year1.head()
year2CO2.to_csv('2017CO2ONLY.csv') year2 = pd.read_csv('2017CO2ONLY.csv', parse_dates=True, index_col=0,) year2.head()
ppm
Time
2017-01-01 00:00:00 482
2017-01-01 00:05:00 491
2017-01-01 00:11:00 480
2017-01-01 00:16:00 486
2017-01-01 00:21:00 490

Now we are left with the last two files, 2016 and 2017 CO2ONLY, and need to combine them

  • Using Concatenate

df1 = pd.read_csv('2016CO2ONLY.csv', parse_dates=True) #df1.head() df2 = pd.read_csv('2017CO2ONLY.csv', parse_dates=True) #df2.head()
combined = pd.concat([df1,df2])

These cells are just to convince myself the data was combined correctly

combined.describe()
ppm
count 100985.000000
mean 547.646878
std 304.511307
min 201.000000
25% 362.000000
50% 438.000000
75% 615.000000
max 2777.000000
combined.head()
Time ppm
0 2016-02-19 13:26:00 NaN
1 2016-02-19 13:27:00 718.0
2 2016-02-19 13:27:00 NaN
3 2016-02-19 13:31:00 337.0
4 2016-02-19 13:36:00 332.0
#df1.describe()
#df2.describe()
#df1.head()
#combined.head()
#df2.tail()
#combined.tail()

Now, finally, we make the last csv file. It will include all the CO2 data from years 2016 and 2017

combined.to_csv('Combinedwithcolumnnanes.csv')
netatmodata = pd.read_csv('Combinedwithcolumnnanes.csv', parse_dates=True, index_col=1) netatmodata.head()
Unnamed: 0 ppm
Time
2016-02-19 13:26:00 0 NaN
2016-02-19 13:27:00 1 718.0
2016-02-19 13:27:00 2 NaN
2016-02-19 13:31:00 3 337.0
2016-02-19 13:36:00 4 332.0

here, I'd like to make another csv without the extra columb.. HELP SOTO

both = netatmodata['ppm']
both.head()
Time 2016-02-19 13:26:00 NaN 2016-02-19 13:27:00 718.0 2016-02-19 13:27:00 NaN 2016-02-19 13:31:00 337.0 2016-02-19 13:36:00 332.0 Name: ppm, dtype: float64
both.to_csv('Netatmo2016_2017CO2ppm.csv')
netatmodata = pd.read_csv('Netatmo2016_2017CO2ppm.csv', parse_dates=True, index_col=0, ) netatmodata.head()
Unnamed: 1
2016-02-19 13:26:00
2016-02-19 13:27:00 718.0
2016-02-19 13:27:00 NaN
2016-02-19 13:31:00 337.0
2016-02-19 13:36:00 332.0
2016-02-19 13:41:00 328.0