Path: blob/master/ML Classification using Python/LAB WORK Pandas Data Preparation & EDA.ipynb
4733 views
Kernel: Python 3 (ipykernel)
In [2]:
Out[2]:
LAB QUESTIONS (12 Tasks)
Participants must use Pandas to solve these questions. They will use the dataset df_lab.
Section A: Basic Exploration**
1️⃣ Display the first 7 rows and the last 5 rows.
Hint: Use .head() and .tail().
2️⃣ Print the summary of all column datatypes and missing values.
Hint: .info(), .isnull().sum().
3️⃣ Show descriptive statistics for numerical columns.
Hint: .describe().
Section B: Cleaning & Missing Value Handling**
4️⃣ Identify which 3 columns have the highest number of missing values.
5️⃣ Fill missing values in:
Age→ with meanDepartment→ with modeTrainingHours→ with median
(Participants decide correct Pandas method.)
6️⃣ Remove rows where both SkillScore AND PerformanceRating are missing.
Section C: Column Operations**
7️⃣ Create a new column: TrainingEfficiency = SkillScore / TrainingHours.
8️⃣ Create another column: Category
Rules:
If
PerformanceRating≥ 4 → “High”Else → “Low”
(Use np.where() or apply().)
Section D: Filtering & Sorting**
9️⃣ Filter employees who:
Work in IT
Have SkillScore > 85
Have ExperienceYears ≥ 5
(Multiple conditions required.)
🔟 Sort the dataset by ExperienceYears (descending) and SkillScore (ascending).
Section E: Grouping & Aggregation**
1️⃣1️⃣ Compute the average:
SkillScore by Department
PerformanceRating by Department
(Use .groupby().)
Section F: Final Step**
Save the cleaned dataset as Employee_Cleaned.csv.**
In [ ]: