Path: blob/master/Generative AI for Intelligent Data Handling/Lab 1 Data Manipulation and Visualization using Python.ipynb
3074 views
Analyze hotel data using python
Preparation:
Load the hotel booking dataset into a Pandas DataFrame. What is the shape of the dataset?
Print the first few rows of the dataset to understand its structure. What are the column names and data types?
Check for missing values in the dataset. How many missing values are there in each column?
Explore some basic statistics of numerical features in the dataset. What are the mean, median, minimum, and maximum values of the 'lead_time' feature?
Explore some basic statistics of categorical features in the dataset. What are the unique values of the 'hotel' feature?
Processing:
Impute missing values in numerical features using the mean strategy. How many missing values were imputed for each numerical feature?
Impute missing values in categorical features using the most frequent strategy. Which values were imputed for each categorical feature?
Convert categorical variables into dummy/indicator variables. How many new columns were added to the dataset after one-hot encoding?
Standardize numerical features using StandardScaler. What are the mean and standard deviation of the 'lead_time' feature after standardization?
Analysis:
Visualize the distribution of numerical features in the dataset. What insights can you gain from the visualizations?
How many unique hotels are there in data?
Create vizualtions to show reservation status.
Explore the relationship between the 'lead_time' and 'is_canceled' features using a scatter plot. Is there any apparent correlation between these two features?
Calculate the cancellation rate for each hotel. Which hotel has the highest cancellation rate?