Kernel: Python 3 (system-wide)
IMPORTING LIBRARIES
In [4]:
READING CSV FILE
In [5]:
In [6]:
Out[6]:
In [7]:
Out[7]:
(545, 13)
In [8]:
Out[8]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 mainroad 545 non-null object
6 guestroom 545 non-null object
7 basement 545 non-null object
8 hotwaterheating 545 non-null object
9 airconditioning 545 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB
In [9]:
Out[9]:
In [16]:
Out[16]:
Missing Values by Column
------------------------------
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64
------------------------------
TOTAL MISSING VALUES: 0
In [35]:
In [36]:
Out[36]:
In [37]:
Out[37]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 parking 545 non-null int64
6 SalePrice 545 non-null int64
dtypes: int64(7)
memory usage: 29.9 KB
Checking for null values
In [38]:
Out[38]:
price 0
area 0
bedrooms 0
bathrooms 0
stories 0
parking 0
SalePrice 0
dtype: int64
General corellation analysis
In [40]:
Out[40]:
<AxesSubplot: >
Analysis on number of bedroom feature
In [41]:
Out[41]:
<AxesSubplot: xlabel='bedrooms', ylabel='price'>
In [42]:
Out[42]:
In [44]:
Out[44]:
(545, 7)
Analysis on number of bedroom feature
In [46]:
Out[46]:
<AxesSubplot: xlabel='price', ylabel='bathrooms'>
In [47]:
Out[47]:
<AxesSubplot: xlabel='price', ylabel='Density'>
In [48]:
Out[48]:
min 1750000
max 13300000
Name: price, dtype: int64
In [55]:
Out[55]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 parking 545 non-null int64
6 SalePrice 545 non-null int64
dtypes: int64(7)
memory usage: 29.9 KB
In [50]:
Out[50]:
<AxesSubplot: xlabel='bathrooms', ylabel='price'>
Analysis on all the instances whose price is 0
In [51]:
Out[51]:
(0, 7)
In [53]:
Out[53]:
In [56]:
Out[56]:
Splitting into train and test set
In [57]:
Out[57]:
(545, 6)
In [58]:
In [59]:
Out[59]:
0.8990825688073395
In [60]:
Out[60]:
0.8990825688073395
In [61]:
Out[61]:
0.509090909090909
In [62]:
Out[62]:
490
27
27
Linear regression
In [63]:
In [64]:
Out[64]:
In [65]:
Out[65]:
2.537905438995759e-09
In [66]:
Out[66]:
431 3290000
2 12250000
497 2660000
316 4060000
473 3003000
210 4900000
512 2520000
158 5495000
77 6650000
163 5425000
Name: price, dtype: int64
In [67]:
Out[67]:
array([ 3290000. , 12250000.00000001, 2660000. ,
4060000. , 3003000. , 4900000. ,
2520000. , 5495000. , 6650000. ,
5425000. , 3710000. , 8400000. ,
2380000. , 4200000. , 5250000. ,
3150000. , 10150000. , 1890000. ,
2940000. , 3234000. , 6720000. ,
4543000. , 6650000. , 2275000. ,
9800000. , 2450000. , 3500000. ])
In [68]:
Out[68]:
1.797584403878829e-09
In [69]:
Out[69]:
1.0
In [70]:
Out[70]:
398 3500000
209 4900000
79 6650000
424 3360000
486 2870000
540 1820000
367 3675000
463 3080000
199 4907000
422 3360000
284 4270000
90 6440000
483 2940000
429 3325000
516 2450000
55 7350000
176 5250000
493 2800000
137 5740000
184 5110000
83 6580000
255 4480000
324 4007500
499 2660000
426 3353000
498 2660000
304 4193000
70 6790000
Name: price, dtype: int64
In [71]:
Out[71]:
array([3500000., 4900000., 6650000., 3360000., 2870000., 1820000.,
3675000., 3080000., 4907000., 3360000., 4270000., 6440000.,
2940000., 3325000., 2450000., 7350000., 5250000., 2800000.,
5740000., 5110000., 6580000., 4480000., 4007500., 2660000.,
3353000., 2660000., 4193000., 6790000.])
Decision tree regression
In [72]:
In [73]:
Out[73]:
In [74]:
Out[74]:
0.9998553122209912
In [75]:
Out[75]:
431 3290000
2 12250000
497 2660000
316 4060000
473 3003000
210 4900000
512 2520000
158 5495000
77 6650000
163 5425000
Name: price, dtype: int64