Path: blob/master/Data Science using Python/Day 3 Pandas Lab.ipynb
3074 views
Generate Emp Data with 1000 rows and 7 colums.
columns=['Name', 'Gender', 'Salary', 'Work Location', 'Age', 'Rating', 'Job Role']
Define lists for job roles, locations, and ratings
job_roles = ['Software Engineer', 'Data Analyst', 'Project Manager', 'Marketing Specialist', 'HR Manager', 'Financial Analyst', 'Sales Executive', 'Customer Support', 'Graphic Designer', 'Product Manager'] locations = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose'] ratings = [1, 2, 3, 4, 5]
Work on following Pandas functions on above generated data
Load a CSV file into a pandas DataFrame and display the first 5 rows.
Get the shape of the DataFrame (number of rows and columns).
Check for missing values in the DataFrame and handle them appropriately.
Filter the DataFrame to only include rows where a specific column meets a certain condition (e.g., age > 30).
Sort the DataFrame based on a specific column in ascending order.
Add a new column to the DataFrame based on a calculation from existing columns.
Group the DataFrame by a categorical variable and calculate summary statistics for each group (e.g., mean, median, count).
Merge two DataFrames based on a common key column.
Remove duplicate rows from the DataFrame.
Rename columns in the DataFrame to make them more descriptive.
Select specific columns from the DataFrame and create a new DataFrame with only those columns.
Reset the index of the DataFrame to default integer index.