Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch09/DataDocumentation.txt
1245 views
1
NAME: AmesHousing.txt
2
TYPE: Population
3
SIZE: 2930 observations, 82 variables
4
ARTICLE TITLE: Ames Iowa: Alternative to the Boston Housing Data Set
5
6
DESCRIPTIVE ABSTRACT: Data set contains information from the Ames Assessor�s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010.
7
8
SOURCES:
9
Ames, Iowa Assessor�s Office
10
11
VARIABLE DESCRIPTIONS:
12
Tab characters are used to separate variables in the data file. The data has 82 columns which include 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables (and 2 additional observation identifiers).
13
14
Order (Discrete): Observation number
15
16
PID (Nominal): Parcel identification number - can be used with city web site for parcel review.
17
18
MS SubClass (Nominal): Identifies the type of dwelling involved in the sale.
19
20
020 1-STORY 1946 & NEWER ALL STYLES
21
030 1-STORY 1945 & OLDER
22
040 1-STORY W/FINISHED ATTIC ALL AGES
23
045 1-1/2 STORY - UNFINISHED ALL AGES
24
050 1-1/2 STORY FINISHED ALL AGES
25
060 2-STORY 1946 & NEWER
26
070 2-STORY 1945 & OLDER
27
075 2-1/2 STORY ALL AGES
28
080 SPLIT OR MULTI-LEVEL
29
085 SPLIT FOYER
30
090 DUPLEX - ALL STYLES AND AGES
31
120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER
32
150 1-1/2 STORY PUD - ALL AGES
33
160 2-STORY PUD - 1946 & NEWER
34
180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
35
190 2 FAMILY CONVERSION - ALL STYLES AND AGES
36
37
MS Zoning (Nominal): Identifies the general zoning classification of the sale.
38
39
A Agriculture
40
C Commercial
41
FV Floating Village Residential
42
I Industrial
43
RH Residential High Density
44
RL Residential Low Density
45
RP Residential Low Density Park
46
RM Residential Medium Density
47
48
Lot Frontage (Continuous): Linear feet of street connected to property
49
50
Lot Area (Continuous): Lot size in square feet
51
52
Street (Nominal): Type of road access to property
53
54
Grvl Gravel
55
Pave Paved
56
57
Alley (Nominal): Type of alley access to property
58
59
Grvl Gravel
60
Pave Paved
61
NA No alley access
62
63
Lot Shape (Ordinal): General shape of property
64
65
Reg Regular
66
IR1 Slightly irregular
67
IR2 Moderately Irregular
68
IR3 Irregular
69
70
Land Contour (Nominal): Flatness of the property
71
72
Lvl Near Flat/Level
73
Bnk Banked - Quick and significant rise from street grade to building
74
HLS Hillside - Significant slope from side to side
75
Low Depression
76
77
Utilities (Ordinal): Type of utilities available
78
79
AllPub All public Utilities (E,G,W,& S)
80
NoSewr Electricity, Gas, and Water (Septic Tank)
81
NoSeWa Electricity and Gas Only
82
ELO Electricity only
83
84
Lot Config (Nominal): Lot configuration
85
86
Inside Inside lot
87
Corner Corner lot
88
CulDSac Cul-de-sac
89
FR2 Frontage on 2 sides of property
90
FR3 Frontage on 3 sides of property
91
92
Land Slope (Ordinal): Slope of property
93
94
Gtl Gentle slope
95
Mod Moderate Slope
96
Sev Severe Slope
97
98
Neighborhood (Nominal): Physical locations within Ames city limits (map available)
99
100
Blmngtn Bloomington Heights
101
Blueste Bluestem
102
BrDale Briardale
103
BrkSide Brookside
104
ClearCr Clear Creek
105
CollgCr College Creek
106
Crawfor Crawford
107
Edwards Edwards
108
Gilbert Gilbert
109
Greens Greens
110
GrnHill Green Hills
111
IDOTRR Iowa DOT and Rail Road
112
Landmrk Landmark
113
MeadowV Meadow Village
114
Mitchel Mitchell
115
Names North Ames
116
NoRidge Northridge
117
NPkVill Northpark Villa
118
NridgHt Northridge Heights
119
NWAmes Northwest Ames
120
OldTown Old Town
121
SWISU South & West of Iowa State University
122
Sawyer Sawyer
123
SawyerW Sawyer West
124
Somerst Somerset
125
StoneBr Stone Brook
126
Timber Timberland
127
Veenker Veenker
128
129
Condition 1 (Nominal): Proximity to various conditions
130
131
Artery Adjacent to arterial street
132
Feedr Adjacent to feeder street
133
Norm Normal
134
RRNn Within 200' of North-South Railroad
135
RRAn Adjacent to North-South Railroad
136
PosN Near positive off-site feature--park, greenbelt, etc.
137
PosA Adjacent to postive off-site feature
138
RRNe Within 200' of East-West Railroad
139
RRAe Adjacent to East-West Railroad
140
141
Condition 2 (Nominal): Proximity to various conditions (if more than one is present)
142
143
Artery Adjacent to arterial street
144
Feedr Adjacent to feeder street
145
Norm Normal
146
RRNn Within 200' of North-South Railroad
147
RRAn Adjacent to North-South Railroad
148
PosN Near positive off-site feature--park, greenbelt, etc.
149
PosA Adjacent to postive off-site feature
150
RRNe Within 200' of East-West Railroad
151
RRAe Adjacent to East-West Railroad
152
153
Bldg Type (Nominal): Type of dwelling
154
155
1Fam Single-family Detached
156
2FmCon Two-family Conversion; originally built as one-family dwelling
157
Duplx Duplex
158
TwnhsE Townhouse End Unit
159
TwnhsI Townhouse Inside Unit
160
161
House Style (Nominal): Style of dwelling
162
163
1Story One story
164
1.5Fin One and one-half story: 2nd level finished
165
1.5Unf One and one-half story: 2nd level unfinished
166
2Story Two story
167
2.5Fin Two and one-half story: 2nd level finished
168
2.5Unf Two and one-half story: 2nd level unfinished
169
SFoyer Split Foyer
170
SLvl Split Level
171
172
Overall Qual (Ordinal): Rates the overall material and finish of the house
173
174
10 Very Excellent
175
9 Excellent
176
8 Very Good
177
7 Good
178
6 Above Average
179
5 Average
180
4 Below Average
181
3 Fair
182
2 Poor
183
1 Very Poor
184
185
Overall Cond (Ordinal): Rates the overall condition of the house
186
187
10 Very Excellent
188
9 Excellent
189
8 Very Good
190
7 Good
191
6 Above Average
192
5 Average
193
4 Below Average
194
3 Fair
195
2 Poor
196
1 Very Poor
197
198
Year Built (Discrete): Original construction date
199
200
Year Remod/Add (Discrete): Remodel date (same as construction date if no remodeling or additions)
201
202
Roof Style (Nominal): Type of roof
203
204
Flat Flat
205
Gable Gable
206
Gambrel Gabrel (Barn)
207
Hip Hip
208
Mansard Mansard
209
Shed Shed
210
211
Roof Matl (Nominal): Roof material
212
213
ClyTile Clay or Tile
214
CompShg Standard (Composite) Shingle
215
Membran Membrane
216
Metal Metal
217
Roll Roll
218
Tar&Grv Gravel & Tar
219
WdShake Wood Shakes
220
WdShngl Wood Shingles
221
222
Exterior 1 (Nominal): Exterior covering on house
223
224
AsbShng Asbestos Shingles
225
AsphShn Asphalt Shingles
226
BrkComm Brick Common
227
BrkFace Brick Face
228
CBlock Cinder Block
229
CemntBd Cement Board
230
HdBoard Hard Board
231
ImStucc Imitation Stucco
232
MetalSd Metal Siding
233
Other Other
234
Plywood Plywood
235
PreCast PreCast
236
Stone Stone
237
Stucco Stucco
238
VinylSd Vinyl Siding
239
Wd Sdng Wood Siding
240
WdShing Wood Shingles
241
242
Exterior 2 (Nominal): Exterior covering on house (if more than one material)
243
244
AsbShng Asbestos Shingles
245
AsphShn Asphalt Shingles
246
BrkComm Brick Common
247
BrkFace Brick Face
248
CBlock Cinder Block
249
CemntBd Cement Board
250
HdBoard Hard Board
251
ImStucc Imitation Stucco
252
MetalSd Metal Siding
253
Other Other
254
Plywood Plywood
255
PreCast PreCast
256
Stone Stone
257
Stucco Stucco
258
VinylSd Vinyl Siding
259
Wd Sdng Wood Siding
260
WdShing Wood Shingles
261
262
Mas Vnr Type (Nominal): Masonry veneer type
263
264
BrkCmn Brick Common
265
BrkFace Brick Face
266
CBlock Cinder Block
267
None None
268
Stone Stone
269
270
Mas Vnr Area (Continuous): Masonry veneer area in square feet
271
272
Exter Qual (Ordinal): Evaluates the quality of the material on the exterior
273
274
Ex Excellent
275
Gd Good
276
TA Average/Typical
277
Fa Fair
278
Po Poor
279
280
Exter Cond (Ordinal): Evaluates the present condition of the material on the exterior
281
282
Ex Excellent
283
Gd Good
284
TA Average/Typical
285
Fa Fair
286
Po Poor
287
288
Foundation (Nominal): Type of foundation
289
290
BrkTil Brick & Tile
291
CBlock Cinder Block
292
PConc Poured Contrete
293
Slab Slab
294
Stone Stone
295
Wood Wood
296
297
Bsmt Qual (Ordinal): Evaluates the height of the basement
298
299
Ex Excellent (100+ inches)
300
Gd Good (90-99 inches)
301
TA Typical (80-89 inches)
302
Fa Fair (70-79 inches)
303
Po Poor (<70 inches
304
NA No Basement
305
306
Bsmt Cond (Ordinal): Evaluates the general condition of the basement
307
308
Ex Excellent
309
Gd Good
310
TA Typical - slight dampness allowed
311
Fa Fair - dampness or some cracking or settling
312
Po Poor - Severe cracking, settling, or wetness
313
NA No Basement
314
315
Bsmt Exposure (Ordinal): Refers to walkout or garden level walls
316
317
Gd Good Exposure
318
Av Average Exposure (split levels or foyers typically score average or above)
319
Mn Mimimum Exposure
320
No No Exposure
321
NA No Basement
322
323
BsmtFin Type 1 (Ordinal): Rating of basement finished area
324
325
GLQ Good Living Quarters
326
ALQ Average Living Quarters
327
BLQ Below Average Living Quarters
328
Rec Average Rec Room
329
LwQ Low Quality
330
Unf Unfinshed
331
NA No Basement
332
333
BsmtFin SF 1 (Continuous): Type 1 finished square feet
334
335
BsmtFinType 2 (Ordinal): Rating of basement finished area (if multiple types)
336
337
GLQ Good Living Quarters
338
ALQ Average Living Quarters
339
BLQ Below Average Living Quarters
340
Rec Average Rec Room
341
LwQ Low Quality
342
Unf Unfinshed
343
NA No Basement
344
345
BsmtFin SF 2 (Continuous): Type 2 finished square feet
346
347
Bsmt Unf SF (Continuous): Unfinished square feet of basement area
348
349
Total Bsmt SF (Continuous): Total square feet of basement area
350
351
Heating (Nominal): Type of heating
352
353
Floor Floor Furnace
354
GasA Gas forced warm air furnace
355
GasW Gas hot water or steam heat
356
Grav Gravity furnace
357
OthW Hot water or steam heat other than gas
358
Wall Wall furnace
359
360
HeatingQC (Ordinal): Heating quality and condition
361
362
Ex Excellent
363
Gd Good
364
TA Average/Typical
365
Fa Fair
366
Po Poor
367
368
Central Air (Nominal): Central air conditioning
369
370
N No
371
Y Yes
372
373
Electrical (Ordinal): Electrical system
374
375
SBrkr Standard Circuit Breakers & Romex
376
FuseA Fuse Box over 60 AMP and all Romex wiring (Average)
377
FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair)
378
FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor)
379
Mix Mixed
380
381
1st Flr SF (Continuous): First Floor square feet
382
383
2nd Flr SF (Continuous) : Second floor square feet
384
385
Low Qual Fin SF (Continuous): Low quality finished square feet (all floors)
386
387
Gr Liv Area (Continuous): Above grade (ground) living area square feet
388
389
Bsmt Full Bath (Discrete): Basement full bathrooms
390
391
Bsmt Half Bath (Discrete): Basement half bathrooms
392
393
Full Bath (Discrete): Full bathrooms above grade
394
395
Half Bath (Discrete): Half baths above grade
396
397
Bedroom (Discrete): Bedrooms above grade (does NOT include basement bedrooms)
398
399
Kitchen (Discrete): Kitchens above grade
400
401
KitchenQual (Ordinal): Kitchen quality
402
403
Ex Excellent
404
Gd Good
405
TA Typical/Average
406
Fa Fair
407
Po Poor
408
409
TotRmsAbvGrd (Discrete): Total rooms above grade (does not include bathrooms)
410
411
Functional (Ordinal): Home functionality (Assume typical unless deductions are warranted)
412
413
Typ Typical Functionality
414
Min1 Minor Deductions 1
415
Min2 Minor Deductions 2
416
Mod Moderate Deductions
417
Maj1 Major Deductions 1
418
Maj2 Major Deductions 2
419
Sev Severely Damaged
420
Sal Salvage only
421
422
Fireplaces (Discrete): Number of fireplaces
423
424
FireplaceQu (Ordinal): Fireplace quality
425
426
Ex Excellent - Exceptional Masonry Fireplace
427
Gd Good - Masonry Fireplace in main level
428
TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
429
Fa Fair - Prefabricated Fireplace in basement
430
Po Poor - Ben Franklin Stove
431
NA No Fireplace
432
433
Garage Type (Nominal): Garage location
434
435
2Types More than one type of garage
436
Attchd Attached to home
437
Basment Basement Garage
438
BuiltIn Built-In (Garage part of house - typically has room above garage)
439
CarPort Car Port
440
Detchd Detached from home
441
NA No Garage
442
443
Garage Yr Blt (Discrete): Year garage was built
444
445
Garage Finish (Ordinal) : Interior finish of the garage
446
447
Fin Finished
448
RFn Rough Finished
449
Unf Unfinished
450
NA No Garage
451
452
Garage Cars (Discrete): Size of garage in car capacity
453
454
Garage Area (Continuous): Size of garage in square feet
455
456
Garage Qual (Ordinal): Garage quality
457
458
Ex Excellent
459
Gd Good
460
TA Typical/Average
461
Fa Fair
462
Po Poor
463
NA No Garage
464
465
Garage Cond (Ordinal): Garage condition
466
467
Ex Excellent
468
Gd Good
469
TA Typical/Average
470
Fa Fair
471
Po Poor
472
NA No Garage
473
474
Paved Drive (Ordinal): Paved driveway
475
476
Y Paved
477
P Partial Pavement
478
N Dirt/Gravel
479
480
Wood Deck SF (Continuous): Wood deck area in square feet
481
482
Open Porch SF (Continuous): Open porch area in square feet
483
484
Enclosed Porch (Continuous): Enclosed porch area in square feet
485
486
3-Ssn Porch (Continuous): Three season porch area in square feet
487
488
Screen Porch (Continuous): Screen porch area in square feet
489
490
Pool Area (Continuous): Pool area in square feet
491
492
Pool QC (Ordinal): Pool quality
493
494
Ex Excellent
495
Gd Good
496
TA Average/Typical
497
Fa Fair
498
NA No Pool
499
500
Fence (Ordinal): Fence quality
501
502
GdPrv Good Privacy
503
MnPrv Minimum Privacy
504
GdWo Good Wood
505
MnWw Minimum Wood/Wire
506
NA No Fence
507
508
Misc Feature (Nominal): Miscellaneous feature not covered in other categories
509
510
Elev Elevator
511
Gar2 2nd Garage (if not described in garage section)
512
Othr Other
513
Shed Shed (over 100 SF)
514
TenC Tennis Court
515
NA None
516
517
Misc Val (Continuous): $Value of miscellaneous feature
518
519
Mo Sold (Discrete): Month Sold (MM)
520
521
Yr Sold (Discrete): Year Sold (YYYY)
522
523
Sale Type (Nominal): Type of sale
524
525
WD Warranty Deed - Conventional
526
CWD Warranty Deed - Cash
527
VWD Warranty Deed - VA Loan
528
New Home just constructed and sold
529
COD Court Officer Deed/Estate
530
Con Contract 15% Down payment regular terms
531
ConLw Contract Low Down payment and low interest
532
ConLI Contract Low Interest
533
ConLD Contract Low Down
534
Oth Other
535
536
Sale Condition (Nominal): Condition of sale
537
538
Normal Normal Sale
539
Abnorml Abnormal Sale - trade, foreclosure, short sale
540
AdjLand Adjoining Land Purchase
541
Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit
542
Family Sale between family members
543
Partial Home was not completed when last assessed (associated with New Homes)
544
545
SalePrice (Continuous): Sale price $$
546
547
SPECIAL NOTES:
548
There are 5 observations that an instructor may wish to remove from the data set before giving it to students (a plot of SALE PRICE versus GR LIV AREA will indicate them quickly). Three of them are true outliers (Partial Sales that likely don�t represent actual market values) and two of them are simply unusual sales (very large houses priced relatively appropriately). I would recommend removing any houses with more than 4000 square feet from the data set (which eliminates these 5 unusual observations) before assigning it to students.
549
550
STORY BEHIND THE DATA:
551
This data set was constructed for the purpose of an end of semester project for an undergraduate regression course. The original data (obtained directly from the Ames Assessor�s Office) is used for tax assessment purposes but lends itself directly to the prediction of home selling prices. The type of information contained in the data is similar to what a typical home buyer would want to know before making a purchase and students should find most variables straightforward and understandable.
552
553
PEDAGOGICAL NOTES:
554
Instructors unfamiliar with multiple regression may wish to use this data set in conjunction with an earlier JSE paper that reviews most of the major issues found in regression modeling:
555
556
Kuiper , S. (2008), �Introduction to Multiple Regression: How Much Is Your Car Worth?�, Journal of Statistics Education Volume 16, Number 3 (2008).
557
558
Outside of the general issues associated with multiple regression discussed in this article, this particular data set offers several opportunities to discuss how the purpose of a model might affect the type of modeling done. User of this data may also want to review another JSE article related directly to real estate pricing:
559
560
Pardoe , I. (2008), �Modeling home prices using realtor data�, Journal of Statistics Education Volume 16, Number 2 (2008).
561
562
One issue is in regards to homoscedasticity and assumption violations. The graph included in the article appears to indicate heteroscedasticity with variation increasing with sale price and this problem is evident in many simple home pricing models that focus only on house and lot sizes. Though this violation can be alleviated by transforming the response variable (sale price), the resulting equation yields difficult to interpret fitted values (selling price in log or square root dollars). This situation gives the instructor the opportunity to talk about the costs (biased estimators, incorrect statistical tests, etc.) and benefits (ease of use) of not correcting this assumption violation. If the purpose in building the model is simply to allow a typical buyer or real estate agent to sit down and estimate the selling price of a house, such transformations may be unnecessary or inappropriate for the task at hand. This issue could also open into a discussion on the contrasts and comparisons between data mining, predictive models, and formal statistical inference.
563
564
A second issue closely related to the intended use of the model, is the handling of outliers and unusual observations. In general, I instruct my students to never throw away data points simply because they do not match a priori expectations (or other data points). I strongly make this point in the situation where data are being analyzed for research purposes that will be shared with a larger audience. Alternatively, if the purpose is to once again create a common use model to estimate a �typical� sale, it is in the modeler�s best interest to remove any observations that do not seem typical (such as foreclosures or family sales).
565
566
REFERENCES:
567
Individual homes within the data set can be referenced directly from the Ames City Assessor webpage via the Parcel ID (PID) found in the data set. Note these are nominal values (non-numeric) so preceding 0�s must be included in the data entry field on the website. Access to the database can be gained from the Ames site (http://www.cityofames.org/assessor/) by clicking on �property search� or by accessing the Beacon (http://beacon.schneidercorp.com/Default.aspx) website and inputting Iowa and Ames in the appropriate fields. A city map showing the location of all the neighborhoods is also available on the Ames site and can be accessed by clicking on �Maps� and then �Residential Assessment Neighborhoods (City of Ames Only)�.
568
569
SUBMITTED BY:
570
Dean De Cock
571
Truman State University
572
100 E. Normal St., Kirksville, MO, 63501
573
[email protected]
574
575
576