CoCalc -- 439LInfiltrationMethods-checkpoint.ipynb

GEP475GROUPINEEDANAP

Justin Hoijer - GEP 475 SP2018 Course

InfiltrationHUB / WorkingSets / 439LInfiltrationMethods-checkpoint.ipynb

³⁸¹³ views

Kernel: Python 3

Future question for Dr. Soto

Should I concatinate the dataframes on integer indices or Date-Time indices? I dont think it will make a difference.. not sure which one is easier

In [777]:

import pandas as pd
import numpy as np

In [778]:

df1 = pd.read_csv('NetAtmo_2016.csv', parse_dates = True,)
df1.describe()

Out[778]:

In [779]:

new_index1 = pd.Series(range(1,90144))

In [780]:

df1['Numbered_index'] = new_index1

In [781]:

df1.set_index('Numbered_index', inplace = True)
df1.head()

Out[781]:

In [782]:

df1.drop(df1.columns[[0,2,3,5,6]], axis =1, inplace = True)

In [783]:

df1.head(1)

Out[783]:

In [784]:

df2 = pd.read_csv('NetAtmo_2017.csv', parse_dates = True)

In [785]:

new_index2 = pd.Series(range(90144, 100992))

In [786]:

df2['numbered_index'] = new_index2

In [787]:

df2.set_index('numbered_index', inplace = True)

In [788]:

df2.drop(df2.columns[[0,2,3,5,6]], axis =1, inplace = True)

In [789]:

df2.head()

Out[789]:

In [790]:

df1.head()

Out[790]:

In [791]:

df1 = df1.rename(columns = {'Timezone : America/Los_Angeles':'Time'})
df1.head()

Out[791]:

In [792]:

df2.tail()

Out[792]:

In [793]:

df3 = pd.concat([df1,df2])
df3.head()

Out[793]:

In [794]:

df3.tail()

Out[794]:

In [795]:

df3.plot()

Out[795]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fc30f81eb00>

In [796]:

#df3.set_index('Time', inplace = True)
#df3.head()

In [797]:

#df3.plot()

In [798]:

#df3.plot.hist()

In [799]:

df3.dtypes

Out[799]:

Time     object
CO2     float64
dtype: object

In [800]:

df3.head()

Out[800]:

In [801]:

df3.isnull().head()

Out[801]:

In [0]:

In [0]:

In [802]:

df3['Time'] = pd.to_datetime(df3.Time)

In [803]:

df3.head()

Out[803]:

https://pandas.pydata.org/pandas-docs/stable/api.html#datetimelike-properties

In [804]:

df3.Time.dt.weekday_name.head()

Out[804]:

  Friday
  Friday
  Friday
  Friday
  Friday
Name: Time, dtype: object

In [805]:

#isolating the seonc day
Firstday = pd.to_datetime('2/20/2016 23:59:59')

In [806]:

df3.loc[df3.Time <= Firstday, :].tail()

Out[806]:

In [807]:

#almost a full year of data!
(df3.Time.max() - df3.Time.min())

Out[807]:

Timedelta('359 days 05:21:00')

In [808]:

df3['Day'] = df3.Time.dt.weekday_name
df3.head()

Out[808]:

In [809]:

# so many questions
df3.Day.value_counts()

Out[809]:

Saturday     14598
Tuesday      14589
Sunday       14539
Monday       14488
Wednesday    14472
Friday       14438
Thursday     13867
Name: Day, dtype: int64

In [810]:

df3.Day.value_counts().plot()

Out[810]:

<matplotlib.axes._subplots.AxesSubplot at 0x7fc30f0ed1d0>

Switching to df2 because it is still note recognized by datetime

In [811]:

df2.head()

Out[811]:

In [812]:

#df3['Time2'].head()

In [813]:

df3.head()

Out[813]:

In [814]:

df3['Time2'] = df3.Time.shift(-1)

In [815]:

df3.head()

Out[815]:

In [816]:

df3['TimeDel'] = df3.Time2 - df3.Time
df3.head()

Out[816]:

In [817]:

df3.TimeDel.dt.seconds.head()

Out[817]:

   60.0
    0.0
  240.0
  300.0
  300.0
Name: TimeDel, dtype: float64

In [818]:

df3.dtypes

Out[818]:

Time        datetime64[ns]
CO2                float64
Day                 object
Time2       datetime64[ns]
TimeDel    timedelta64[ns]
dtype: object

In [819]:

df3['TimeDel'] = df3.TimeDel /  np.timedelta64(1, 's')

In [820]:

df3.dtypes

Out[820]:

Time       datetime64[ns]
CO2               float64
Day                object
Time2      datetime64[ns]
TimeDel           float64
dtype: object

In [821]:

df3['CO2_over_TimeDiff'] = (df3.CO2 / df3.TimeDel)

In [822]:

# number of "not a number" in each column
df3.isnull().sum()

Out[822]:

Time                 0
CO2                  6
Day                  0
Time2                1
TimeDel              1
CO2_over_TimeDiff    7
dtype: int64

In [823]:

df3[df3.CO2.isnull()]

Out[823]:

In [824]:

df3.shape

Out[824]:

(100991, 6)

In [825]:

# dropping rows that have "any" missing values
df3.dropna(how='any', inplace = True)

In [826]:

df3.shape

Out[826]:

(100984, 6)

In [827]:

df3.head()

Out[827]:

In [828]:

df3.describe()

Out[828]:

In [829]:

df3.CO2_over_TimeDiff.head()

Out[829]:

       inf
  1.123333
  1.106667
  1.093333
  1.023333
Name: CO2_over_TimeDiff, dtype: float64

In [0]:

Future question for Dr. Soto

Product

Resources

Company