Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Generative AI for Intelligent Data Handling/Lab 1 Data Manipulation and Visualization using Python.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Analyze hotel data using python

Preparation:

  • Load the hotel booking dataset into a Pandas DataFrame. What is the shape of the dataset?

  • Print the first few rows of the dataset to understand its structure. What are the column names and data types?

  • Check for missing values in the dataset. How many missing values are there in each column?

  • Explore some basic statistics of numerical features in the dataset. What are the mean, median, minimum, and maximum values of the 'lead_time' feature?

  • Explore some basic statistics of categorical features in the dataset. What are the unique values of the 'hotel' feature?

Processing:

  • Impute missing values in numerical features using the mean strategy. How many missing values were imputed for each numerical feature?

  • Impute missing values in categorical features using the most frequent strategy. Which values were imputed for each categorical feature?

  • Convert categorical variables into dummy/indicator variables. How many new columns were added to the dataset after one-hot encoding?

  • Standardize numerical features using StandardScaler. What are the mean and standard deviation of the 'lead_time' feature after standardization?

Analysis:

  • Visualize the distribution of numerical features in the dataset. What insights can you gain from the visualizations?

  • How many unique hotels are there in data?

  • Create vizualtions to show reservation status.

  • Explore the relationship between the 'lead_time' and 'is_canceled' features using a scatter plot. Is there any apparent correlation between these two features?

  • Calculate the cancellation rate for each hotel. Which hotel has the highest cancellation rate?