CoCalc -- IsolateCO2&Concatenating.ipynb

GEP475GROUPINEEDANAP

Project: Justin Hoijer - GEP 475 SP2018 Course

Path: InfiltrationHUB / Methods / 1.RawDataFixing_Concat_CO2.isolation / IsolateCO2&Concatenating.ipynb

Views: ¹⁴⁶¹

Kernel: Python 3 (Anaconda)

Creating New "CO2 only" csv files, from orginal Netatmo csvs

making sure both files have same index and same column name for CO2

In [13]:

import numpy as np
import pandas as pd

In [2]:

year1 = pd.read_csv('../DataFiles/Raw/NetAtmo_2016.csv', parse_dates=True, index_col=1)
year1.head()

	Timestamp	Temperature	Humidity	CO2	Noise	Pressure
Timezone : America/Los_Angeles
2016-02-19 13:26:00	1455917199	18.8	76	NaN	NaN	1015.7
2016-02-19 13:27:00	1455917255	19.2	75	718.0	NaN	1015.7
2016-02-19 13:27:00	1455917257	19.9	73	NaN	NaN	1015.7
2016-02-19 13:31:00	1455917513	20.3	73	337.0	44.0	1015.8
2016-02-19 13:36:00	1455917814	21.2	70	332.0	47.0	1015.7

In [3]:

year1.describe()

	Timestamp	Temperature	Humidity	CO2	Noise	Pressure
count	9.014300e+04	90143.000000	90143.000000	90137.000000	90132.000000	90143.000000
mean	1.469613e+09	22.766724	51.291637	550.204799	38.964730	1011.348933
std	7.900773e+06	1.592268	6.929620	318.321732	7.100703	4.217541
min	1.455917e+09	17.900000	27.000000	201.000000	35.000000	995.000000
25%	1.462765e+09	21.700000	49.000000	354.000000	36.000000	1008.300000
50%	1.469657e+09	22.900000	52.000000	416.000000	36.000000	1011.000000
75%	1.476459e+09	23.800000	55.000000	639.000000	38.000000	1014.100000
max	1.483257e+09	28.500000	76.000000	2777.000000	79.000000	1027.500000

In [0]:

In [4]:

year2 = pd.read_csv('NetAtmo_2017.csv', parse_dates=True, index_col=1)
year2.head()

	Timestamp	Temperature	Humidity	CO2	Noise	Pressure
Time
2017-01-01 00:00:00	1483257658	21.8	34	482	39	1009.1
2017-01-01 00:05:00	1483257959	21.8	34	491	41	1009.2
2017-01-01 00:11:00	1483258260	21.9	34	480	39	1009.2
2017-01-01 00:16:00	1483258562	21.9	34	486	39	1009.2
2017-01-01 00:21:00	1483258864	21.9	34	490	37	1009.3

above you can see, the 2016 csv (year1) did not have the same index as the 2017 file.

How to change:

In [5]:

year1.index.names = ['Time']
year1.head()

	Timestamp	Temperature	Humidity	CO2	Noise	Pressure
Time
2016-02-19 13:26:00	1455917199	18.8	76	NaN	NaN	1015.7
2016-02-19 13:27:00	1455917255	19.2	75	718.0	NaN	1015.7
2016-02-19 13:27:00	1455917257	19.9	73	NaN	NaN	1015.7
2016-02-19 13:31:00	1455917513	20.3	73	337.0	44.0	1015.8
2016-02-19 13:36:00	1455917814	21.2	70	332.0	47.0	1015.7

Now that we have the same index, we'll go ahead and create the new files.

isolating the CO2 data

In [6]:

firstyear = year1['CO2']
#firstyear.head()

In [7]:

secondyear = year2['CO2']
#secondyear.head()

using to_csv

In [8]:

firstyear.to_csv('Netatmo2016CO2ONLY.csv')

In [9]:

column_names = ['Time','ppm']
year1CO2 = pd.read_csv('Netatmo2016CO2ONLY.csv', parse_dates=True, index_col=0, names=column_names)
year1CO2.head()

	ppm
Time
2016-02-19 13:26:00	NaN
2016-02-19 13:27:00	718.0
2016-02-19 13:27:00	NaN
2016-02-19 13:31:00	337.0
2016-02-19 13:36:00	332.0

In [10]:

#year1CO2.describe()

In [11]:

secondyear.to_csv('Netatmo2017CO2ONLY.csv')

In [12]:

column_names = ['Time','ppm']
year2CO2= pd.read_csv('Netatmo2017CO2ONLY.csv', parse_dates=True, index_col=0, names=column_names)
year2CO2.head()

	ppm
Time
2017-01-01 00:00:00	482
2017-01-01 00:05:00	491
2017-01-01 00:11:00	480
2017-01-01 00:16:00	486
2017-01-01 00:21:00	490

Next, I want to create another csv file that has the column names already inside. (I dont know a better way)*

In [14]:

year1CO2.to_csv('2016CO2ONLY.csv')
year1 =  pd.read_csv('2016CO2ONLY.csv', parse_dates=True, index_col=0,)
#year1.head()

In [15]:

year2CO2.to_csv('2017CO2ONLY.csv')
year2 =  pd.read_csv('2017CO2ONLY.csv', parse_dates=True, index_col=0,)
year2.head()

	ppm
Time
2017-01-01 00:00:00	482
2017-01-01 00:05:00	491
2017-01-01 00:11:00	480
2017-01-01 00:16:00	486
2017-01-01 00:21:00	490

Now we are left with the last two files, 2016 and 2017 CO2ONLY, and need to combine them

Using Concatenate

In [16]:

df1 = pd.read_csv('2016CO2ONLY.csv', parse_dates=True)
#df1.head()
df2 = pd.read_csv('2017CO2ONLY.csv', parse_dates=True)
#df2.head()

In [17]:

combined = pd.concat([df1,df2])

These cells are just to convince myself the data was combined correctly

In [18]:

combined.describe()

	ppm
count	100985.000000
mean	547.646878
std	304.511307
min	201.000000
25%	362.000000
50%	438.000000
75%	615.000000
max	2777.000000

In [19]:

combined.head()

	Time	ppm
0	2016-02-19 13:26:00	NaN
1	2016-02-19 13:27:00	718.0
2	2016-02-19 13:27:00	NaN
3	2016-02-19 13:31:00	337.0
4	2016-02-19 13:36:00	332.0

In [20]:

#df1.describe()

In [21]:

#df2.describe()

In [22]:

#df1.head()

In [23]:

#combined.head()

In [24]:

#df2.tail()

In [25]:

#combined.tail()

Now, finally, we make the last csv file. It will include all the CO2 data from years 2016 and 2017

In [26]:

combined.to_csv('Combinedwithcolumnnanes.csv')

In [27]:

netatmodata = pd.read_csv('Combinedwithcolumnnanes.csv', parse_dates=True, index_col=1)

netatmodata.head()

	Unnamed: 0	ppm
Time
2016-02-19 13:26:00	0	NaN
2016-02-19 13:27:00	1	718.0
2016-02-19 13:27:00	2	NaN
2016-02-19 13:31:00	3	337.0
2016-02-19 13:36:00	4	332.0

here, I'd like to make another csv without the extra columb.. HELP SOTO

In [28]:

both = netatmodata['ppm']

In [29]:

both.head()

Time
2016-02-19 13:26:00      NaN
2016-02-19 13:27:00    718.0
2016-02-19 13:27:00      NaN
2016-02-19 13:31:00    337.0
2016-02-19 13:36:00    332.0
Name: ppm, dtype: float64

In [30]:

both.to_csv('Netatmo2016_2017CO2ppm.csv')

In [31]:

netatmodata = pd.read_csv('Netatmo2016_2017CO2ppm.csv', parse_dates=True, index_col=0, )

netatmodata.head()

	Unnamed: 1
2016-02-19 13:26:00
2016-02-19 13:27:00	718.0
2016-02-19 13:27:00	NaN
2016-02-19 13:31:00	337.0
2016-02-19 13:36:00	332.0
2016-02-19 13:41:00	328.0

In [0]:

Creating New "CO2 only" csv files, from orginal Netatmo csvs

above you can see, the 2016 csv (year1) did not have the same index as the 2017 file.

How to change:

Now that we have the same index, we'll go ahead and create the new files.

using to_csv

Next, I want to create another csv file that has the column names already inside. (I dont know a better way)*

Now we are left with the last two files, 2016 and 2017 CO2ONLY, and need to combine them

These cells are just to convince myself the data was combined correctly

Now, finally, we make the last csv file. It will include all the CO2 data from years 2016 and 2017

here, I'd like to make another csv without the extra columb.. HELP SOTO

Product

Resources

Company