Path: blob/master/lessons/lesson_01/assets/dataset/ames_data_documentation.txt
1904 views
NAME: AmesHousing.txt1TYPE: Population2SIZE: 2930 observations, 82 variables3ARTICLE TITLE: Ames Iowa: Alternative to the Boston Housing Data Set45DESCRIPTIVE ABSTRACT: Data set contains information from the Ames Assessor�s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010.67SOURCES:8Ames, Iowa Assessor�s Office910VARIABLE DESCRIPTIONS:11Tab characters are used to separate variables in the data file. The data has 82 columns which include 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables (and 2 additional observation identifiers).1213Order (Discrete): Observation number1415PID (Nominal): Parcel identification number - can be used with city web site for parcel review.1617MS SubClass (Nominal): Identifies the type of dwelling involved in the sale.1819020 1-STORY 1946 & NEWER ALL STYLES20030 1-STORY 1945 & OLDER21040 1-STORY W/FINISHED ATTIC ALL AGES22045 1-1/2 STORY - UNFINISHED ALL AGES23050 1-1/2 STORY FINISHED ALL AGES24060 2-STORY 1946 & NEWER25070 2-STORY 1945 & OLDER26075 2-1/2 STORY ALL AGES27080 SPLIT OR MULTI-LEVEL28085 SPLIT FOYER29090 DUPLEX - ALL STYLES AND AGES30120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER31150 1-1/2 STORY PUD - ALL AGES32160 2-STORY PUD - 1946 & NEWER33180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER34190 2 FAMILY CONVERSION - ALL STYLES AND AGES3536MS Zoning (Nominal): Identifies the general zoning classification of the sale.3738A Agriculture39C Commercial40FV Floating Village Residential41I Industrial42RH Residential High Density43RL Residential Low Density44RP Residential Low Density Park45RM Residential Medium Density4647Lot Frontage (Continuous): Linear feet of street connected to property4849Lot Area (Continuous): Lot size in square feet5051Street (Nominal): Type of road access to property5253Grvl Gravel54Pave Paved5556Alley (Nominal): Type of alley access to property5758Grvl Gravel59Pave Paved60NA No alley access6162Lot Shape (Ordinal): General shape of property6364Reg Regular65IR1 Slightly irregular66IR2 Moderately Irregular67IR3 Irregular6869Land Contour (Nominal): Flatness of the property7071Lvl Near Flat/Level72Bnk Banked - Quick and significant rise from street grade to building73HLS Hillside - Significant slope from side to side74Low Depression7576Utilities (Ordinal): Type of utilities available7778AllPub All public Utilities (E,G,W,& S)79NoSewr Electricity, Gas, and Water (Septic Tank)80NoSeWa Electricity and Gas Only81ELO Electricity only8283Lot Config (Nominal): Lot configuration8485Inside Inside lot86Corner Corner lot87CulDSac Cul-de-sac88FR2 Frontage on 2 sides of property89FR3 Frontage on 3 sides of property9091Land Slope (Ordinal): Slope of property9293Gtl Gentle slope94Mod Moderate Slope95Sev Severe Slope9697Neighborhood (Nominal): Physical locations within Ames city limits (map available)9899Blmngtn Bloomington Heights100Blueste Bluestem101BrDale Briardale102BrkSide Brookside103ClearCr Clear Creek104CollgCr College Creek105Crawfor Crawford106Edwards Edwards107Gilbert Gilbert108Greens Greens109GrnHill Green Hills110IDOTRR Iowa DOT and Rail Road111Landmrk Landmark112MeadowV Meadow Village113Mitchel Mitchell114Names North Ames115NoRidge Northridge116NPkVill Northpark Villa117NridgHt Northridge Heights118NWAmes Northwest Ames119OldTown Old Town120SWISU South & West of Iowa State University121Sawyer Sawyer122SawyerW Sawyer West123Somerst Somerset124StoneBr Stone Brook125Timber Timberland126Veenker Veenker127128Condition 1 (Nominal): Proximity to various conditions129130Artery Adjacent to arterial street131Feedr Adjacent to feeder street132Norm Normal133RRNn Within 200' of North-South Railroad134RRAn Adjacent to North-South Railroad135PosN Near positive off-site feature--park, greenbelt, etc.136PosA Adjacent to postive off-site feature137RRNe Within 200' of East-West Railroad138RRAe Adjacent to East-West Railroad139140Condition 2 (Nominal): Proximity to various conditions (if more than one is present)141142Artery Adjacent to arterial street143Feedr Adjacent to feeder street144Norm Normal145RRNn Within 200' of North-South Railroad146RRAn Adjacent to North-South Railroad147PosN Near positive off-site feature--park, greenbelt, etc.148PosA Adjacent to postive off-site feature149RRNe Within 200' of East-West Railroad150RRAe Adjacent to East-West Railroad151152Bldg Type (Nominal): Type of dwelling1531541Fam Single-family Detached1552FmCon Two-family Conversion; originally built as one-family dwelling156Duplx Duplex157TwnhsE Townhouse End Unit158TwnhsI Townhouse Inside Unit159160House Style (Nominal): Style of dwelling1611621Story One story1631.5Fin One and one-half story: 2nd level finished1641.5Unf One and one-half story: 2nd level unfinished1652Story Two story1662.5Fin Two and one-half story: 2nd level finished1672.5Unf Two and one-half story: 2nd level unfinished168SFoyer Split Foyer169SLvl Split Level170171Overall Qual (Ordinal): Rates the overall material and finish of the house17217310 Very Excellent1749 Excellent1758 Very Good1767 Good1776 Above Average1785 Average1794 Below Average1803 Fair1812 Poor1821 Very Poor183184Overall Cond (Ordinal): Rates the overall condition of the house18518610 Very Excellent1879 Excellent1888 Very Good1897 Good1906 Above Average1915 Average1924 Below Average1933 Fair1942 Poor1951 Very Poor196197Year Built (Discrete): Original construction date198199Year Remod/Add (Discrete): Remodel date (same as construction date if no remodeling or additions)200201Roof Style (Nominal): Type of roof202203Flat Flat204Gable Gable205Gambrel Gabrel (Barn)206Hip Hip207Mansard Mansard208Shed Shed209210Roof Matl (Nominal): Roof material211212ClyTile Clay or Tile213CompShg Standard (Composite) Shingle214Membran Membrane215Metal Metal216Roll Roll217Tar&Grv Gravel & Tar218WdShake Wood Shakes219WdShngl Wood Shingles220221Exterior 1 (Nominal): Exterior covering on house222223AsbShng Asbestos Shingles224AsphShn Asphalt Shingles225BrkComm Brick Common226BrkFace Brick Face227CBlock Cinder Block228CemntBd Cement Board229HdBoard Hard Board230ImStucc Imitation Stucco231MetalSd Metal Siding232Other Other233Plywood Plywood234PreCast PreCast235Stone Stone236Stucco Stucco237VinylSd Vinyl Siding238Wd Sdng Wood Siding239WdShing Wood Shingles240241Exterior 2 (Nominal): Exterior covering on house (if more than one material)242243AsbShng Asbestos Shingles244AsphShn Asphalt Shingles245BrkComm Brick Common246BrkFace Brick Face247CBlock Cinder Block248CemntBd Cement Board249HdBoard Hard Board250ImStucc Imitation Stucco251MetalSd Metal Siding252Other Other253Plywood Plywood254PreCast PreCast255Stone Stone256Stucco Stucco257VinylSd Vinyl Siding258Wd Sdng Wood Siding259WdShing Wood Shingles260261Mas Vnr Type (Nominal): Masonry veneer type262263BrkCmn Brick Common264BrkFace Brick Face265CBlock Cinder Block266None None267Stone Stone268269Mas Vnr Area (Continuous): Masonry veneer area in square feet270271Exter Qual (Ordinal): Evaluates the quality of the material on the exterior272273Ex Excellent274Gd Good275TA Average/Typical276Fa Fair277Po Poor278279Exter Cond (Ordinal): Evaluates the present condition of the material on the exterior280281Ex Excellent282Gd Good283TA Average/Typical284Fa Fair285Po Poor286287Foundation (Nominal): Type of foundation288289BrkTil Brick & Tile290CBlock Cinder Block291PConc Poured Contrete292Slab Slab293Stone Stone294Wood Wood295296Bsmt Qual (Ordinal): Evaluates the height of the basement297298Ex Excellent (100+ inches)299Gd Good (90-99 inches)300TA Typical (80-89 inches)301Fa Fair (70-79 inches)302Po Poor (<70 inches303NA No Basement304305Bsmt Cond (Ordinal): Evaluates the general condition of the basement306307Ex Excellent308Gd Good309TA Typical - slight dampness allowed310Fa Fair - dampness or some cracking or settling311Po Poor - Severe cracking, settling, or wetness312NA No Basement313314Bsmt Exposure (Ordinal): Refers to walkout or garden level walls315316Gd Good Exposure317Av Average Exposure (split levels or foyers typically score average or above)318Mn Mimimum Exposure319No No Exposure320NA No Basement321322BsmtFin Type 1 (Ordinal): Rating of basement finished area323324GLQ Good Living Quarters325ALQ Average Living Quarters326BLQ Below Average Living Quarters327Rec Average Rec Room328LwQ Low Quality329Unf Unfinshed330NA No Basement331332BsmtFin SF 1 (Continuous): Type 1 finished square feet333334BsmtFinType 2 (Ordinal): Rating of basement finished area (if multiple types)335336GLQ Good Living Quarters337ALQ Average Living Quarters338BLQ Below Average Living Quarters339Rec Average Rec Room340LwQ Low Quality341Unf Unfinshed342NA No Basement343344BsmtFin SF 2 (Continuous): Type 2 finished square feet345346Bsmt Unf SF (Continuous): Unfinished square feet of basement area347348Total Bsmt SF (Continuous): Total square feet of basement area349350Heating (Nominal): Type of heating351352Floor Floor Furnace353GasA Gas forced warm air furnace354GasW Gas hot water or steam heat355Grav Gravity furnace356OthW Hot water or steam heat other than gas357Wall Wall furnace358359HeatingQC (Ordinal): Heating quality and condition360361Ex Excellent362Gd Good363TA Average/Typical364Fa Fair365Po Poor366367Central Air (Nominal): Central air conditioning368369N No370Y Yes371372Electrical (Ordinal): Electrical system373374SBrkr Standard Circuit Breakers & Romex375FuseA Fuse Box over 60 AMP and all Romex wiring (Average)376FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair)377FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor)378Mix Mixed3793801st Flr SF (Continuous): First Floor square feet3813822nd Flr SF (Continuous) : Second floor square feet383384Low Qual Fin SF (Continuous): Low quality finished square feet (all floors)385386Gr Liv Area (Continuous): Above grade (ground) living area square feet387388Bsmt Full Bath (Discrete): Basement full bathrooms389390Bsmt Half Bath (Discrete): Basement half bathrooms391392Full Bath (Discrete): Full bathrooms above grade393394Half Bath (Discrete): Half baths above grade395396Bedroom (Discrete): Bedrooms above grade (does NOT include basement bedrooms)397398Kitchen (Discrete): Kitchens above grade399400KitchenQual (Ordinal): Kitchen quality401402Ex Excellent403Gd Good404TA Typical/Average405Fa Fair406Po Poor407408TotRmsAbvGrd (Discrete): Total rooms above grade (does not include bathrooms)409410Functional (Ordinal): Home functionality (Assume typical unless deductions are warranted)411412Typ Typical Functionality413Min1 Minor Deductions 1414Min2 Minor Deductions 2415Mod Moderate Deductions416Maj1 Major Deductions 1417Maj2 Major Deductions 2418Sev Severely Damaged419Sal Salvage only420421Fireplaces (Discrete): Number of fireplaces422423FireplaceQu (Ordinal): Fireplace quality424425Ex Excellent - Exceptional Masonry Fireplace426Gd Good - Masonry Fireplace in main level427TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement428Fa Fair - Prefabricated Fireplace in basement429Po Poor - Ben Franklin Stove430NA No Fireplace431432Garage Type (Nominal): Garage location4334342Types More than one type of garage435Attchd Attached to home436Basment Basement Garage437BuiltIn Built-In (Garage part of house - typically has room above garage)438CarPort Car Port439Detchd Detached from home440NA No Garage441442Garage Yr Blt (Discrete): Year garage was built443444Garage Finish (Ordinal) : Interior finish of the garage445446Fin Finished447RFn Rough Finished448Unf Unfinished449NA No Garage450451Garage Cars (Discrete): Size of garage in car capacity452453Garage Area (Continuous): Size of garage in square feet454455Garage Qual (Ordinal): Garage quality456457Ex Excellent458Gd Good459TA Typical/Average460Fa Fair461Po Poor462NA No Garage463464Garage Cond (Ordinal): Garage condition465466Ex Excellent467Gd Good468TA Typical/Average469Fa Fair470Po Poor471NA No Garage472473Paved Drive (Ordinal): Paved driveway474475Y Paved476P Partial Pavement477N Dirt/Gravel478479Wood Deck SF (Continuous): Wood deck area in square feet480481Open Porch SF (Continuous): Open porch area in square feet482483Enclosed Porch (Continuous): Enclosed porch area in square feet4844853-Ssn Porch (Continuous): Three season porch area in square feet486487Screen Porch (Continuous): Screen porch area in square feet488489Pool Area (Continuous): Pool area in square feet490491Pool QC (Ordinal): Pool quality492493Ex Excellent494Gd Good495TA Average/Typical496Fa Fair497NA No Pool498499Fence (Ordinal): Fence quality500501GdPrv Good Privacy502MnPrv Minimum Privacy503GdWo Good Wood504MnWw Minimum Wood/Wire505NA No Fence506507Misc Feature (Nominal): Miscellaneous feature not covered in other categories508509Elev Elevator510Gar2 2nd Garage (if not described in garage section)511Othr Other512Shed Shed (over 100 SF)513TenC Tennis Court514NA None515516Misc Val (Continuous): $Value of miscellaneous feature517518Mo Sold (Discrete): Month Sold (MM)519520Yr Sold (Discrete): Year Sold (YYYY)521522Sale Type (Nominal): Type of sale523524WD Warranty Deed - Conventional525CWD Warranty Deed - Cash526VWD Warranty Deed - VA Loan527New Home just constructed and sold528COD Court Officer Deed/Estate529Con Contract 15% Down payment regular terms530ConLw Contract Low Down payment and low interest531ConLI Contract Low Interest532ConLD Contract Low Down533Oth Other534535Sale Condition (Nominal): Condition of sale536537Normal Normal Sale538Abnorml Abnormal Sale - trade, foreclosure, short sale539AdjLand Adjoining Land Purchase540Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit541Family Sale between family members542Partial Home was not completed when last assessed (associated with New Homes)543544SalePrice (Continuous): Sale price $$545546SPECIAL NOTES:547There are 5 observations that an instructor may wish to remove from the data set before giving it to students (a plot of SALE PRICE versus GR LIV AREA will indicate them quickly). Three of them are true outliers (Partial Sales that likely don�t represent actual market values) and two of them are simply unusual sales (very large houses priced relatively appropriately). I would recommend removing any houses with more than 4000 square feet from the data set (which eliminates these 5 unusual observations) before assigning it to students.548549STORY BEHIND THE DATA:550This data set was constructed for the purpose of an end of semester project for an undergraduate regression course. The original data (obtained directly from the Ames Assessor�s Office) is used for tax assessment purposes but lends itself directly to the prediction of home selling prices. The type of information contained in the data is similar to what a typical home buyer would want to know before making a purchase and students should find most variables straightforward and understandable.551552PEDAGOGICAL NOTES:553Instructors unfamiliar with multiple regression may wish to use this data set in conjunction with an earlier JSE paper that reviews most of the major issues found in regression modeling:554555Kuiper , S. (2008), �Introduction to Multiple Regression: How Much Is Your Car Worth?�, Journal of Statistics Education Volume 16, Number 3 (2008).556557Outside of the general issues associated with multiple regression discussed in this article, this particular data set offers several opportunities to discuss how the purpose of a model might affect the type of modeling done. User of this data may also want to review another JSE article related directly to real estate pricing:558559Pardoe , I. (2008), �Modeling home prices using realtor data�, Journal of Statistics Education Volume 16, Number 2 (2008).560561One issue is in regards to homoscedasticity and assumption violations. The graph included in the article appears to indicate heteroscedasticity with variation increasing with sale price and this problem is evident in many simple home pricing models that focus only on house and lot sizes. Though this violation can be alleviated by transforming the response variable (sale price), the resulting equation yields difficult to interpret fitted values (selling price in log or square root dollars). This situation gives the instructor the opportunity to talk about the costs (biased estimators, incorrect statistical tests, etc.) and benefits (ease of use) of not correcting this assumption violation. If the purpose in building the model is simply to allow a typical buyer or real estate agent to sit down and estimate the selling price of a house, such transformations may be unnecessary or inappropriate for the task at hand. This issue could also open into a discussion on the contrasts and comparisons between data mining, predictive models, and formal statistical inference.562563A second issue closely related to the intended use of the model, is the handling of outliers and unusual observations. In general, I instruct my students to never throw away data points simply because they do not match a priori expectations (or other data points). I strongly make this point in the situation where data are being analyzed for research purposes that will be shared with a larger audience. Alternatively, if the purpose is to once again create a common use model to estimate a �typical� sale, it is in the modeler�s best interest to remove any observations that do not seem typical (such as foreclosures or family sales).564565REFERENCES:566Individual homes within the data set can be referenced directly from the Ames City Assessor webpage via the Parcel ID (PID) found in the data set. Note these are nominal values (non-numeric) so preceding 0�s must be included in the data entry field on the website. Access to the database can be gained from the Ames site (http://www.cityofames.org/assessor/) by clicking on �property search� or by accessing the Beacon (http://beacon.schneidercorp.com/Default.aspx) website and inputting Iowa and Ames in the appropriate fields. A city map showing the location of all the neighborhoods is also available on the Ames site and can be accessed by clicking on �Maps� and then �Residential Assessment Neighborhoods (City of Ames Only)�.567568SUBMITTED BY:569Dean De Cock570Truman State University571100 E. Normal St., Kirksville, MO, 63501572[email protected]573574575576