Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Exploring the NYC Airbnb Market/notebook.ipynb
Views: 1229
1. Importing the Data
Welcome to New York City (NYC), one of the most-visited cities in the world.
As a result, there are many Airbnb listings to meet the high demand for temporary lodging for anywhere between a few nights to many months.
In this notebook, we will take a look at the NYC Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
We will be working with three datasets:
"datasets/airbnb_price.csv"
"datasets/airbnb_room_type.xlsx"
"datasets/airbnb_last_review.tsv"
Our goals are to convert untidy data into appropriate formats to analyze, and answer key questions including:
- What is the average price, per night, of an Airbnb listing in NYC?
- How does the average price of an Airbnb listing, per month, compare to the private rental market?
- How many adverts are for private rooms?
- How do Airbnb listing prices compare across the five NYC boroughs?
2. Cleaning the price column
Now the DataFrames
have been loaded, the first step is to calculate the average price per listing by room_type
.
You may have noticed that the price
column in the prices
DataFrame currently states each value as a string with the currency (dollars) following, i.e.,
price
225 dollars
89 dollars
200 dollars
We will need to clean the column in order to calculate the average price.
3. Calculating average price
We can see three quarters of listings cost $175 per night or less.
However, there are some outliers including a maximum price of $7,500 per night!
Some of listings are actually showing as free. Let's remove these from the DataFrame
, and calculate the average price.
4. Comparing costs to the private rental market
Now we know how much a listing costs, on average, per night, but it would be useful to have a benchmark for comparison. According to Zumper, a 1 bedroom apartment in New York City costs, on average, $3,100 per month. Let's convert the per night prices of our listings into monthly costs, so we can compare to the private market.
5. Cleaning the room type column
Unsurprisingly, using Airbnb appears to be substantially more expensive than the private rental market. We should, however, consider that these Airbnb listings include single private rooms or even rooms to share, as well as entire homes/apartments.
Let's dive deeper into the room_type
column to find out the breakdown of listings by type of room. The room_type
column has several variations for private room
listings, specifically:
- "Private room"
- "private room"
- "PRIVATE ROOM"
We can solve this by converting all string characters to lower case (upper case would also work just fine).
6. What timeframe are we working with?
It seems there is a fairly similar sized market opportunity for both private rooms (45% of listings) and entire homes/apartments (52%) on the Airbnb platform in NYC.
Now let's turn our attention to the reviews
DataFrame. The last_review
column contains the date of the last review in the format of "Month Day Year" e.g., May 21 2019. We've been asked to find out the earliest and latest review dates in the DataFrame, and ensure the format allows this analysis to be easily conducted going forwards.
7. Joining the DataFrames.
Now we've extracted the information needed, we will merge the three DataFrames to make any future analysis easier to conduct. Once we have joined the data, we will remove any observations with missing values and check for duplicates.
8. Analyzing listing prices by NYC borough
Now we have combined all data into a single DataFrame, we will turn our attention to understanding the difference in listing prices between New York City boroughs.
We can currently see boroughs listed as the first part of a string within the nbhood_full
column, e.g.,
Manhattan, Midtown
Brooklyn, Clinton Hill
Manhattan, Murray Hill
Manhattan, Hell's Kitchen
Manhattan, Chinatown
We will therefore need to extract this information from the string and store in a new column, borough
, for analysis.
9. Price range by borough
The above output gives us a summary of prices for listings across the 5 boroughs. In this final task we would like to categorize listings based on whether they fall into specific price ranges, and view this by borough.
We can do this using percentiles and labels to create a new column, price_range
, in the DataFrame.
Once we have created the labels, we can then group the data and count frequencies for listings in each price range by borough.
We will assign the following categories and price ranges:
label | price |
---|---|
Budget |
\$0-69 |
Average |
\$70-175 |
Expensive |
\$176-350 |
Extravagant |
> \$350 |