Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Generative AI for Intelligent Data Handling/Day1 part 2 Data Manipulation using Pandas.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Data Manipulation using Pandas

  • Series: 1D () , 1,2,3

  • DataFrame: 2D(), R*C [[1,2]],[[2,2]]

  • Panels: 3D(), ARC , [[[1,2]],[[3,2,]],[[2,3]]

t=(1,2,3) t l=[1,2,3] l l[0]
1
import pandas as pd p=pd.Series((1,2,3),index=("one","two","three")) p
one 1 two 2 three 3 dtype: int64
d={} d[1]="Red" d[2]="Pink" type(d) d s=pd.Series(d) s
1 Red 2 Pink dtype: object
import pandas as pd # Create a DataFrame data = {'Name': ['John', 'Anna', 'Peter', 'Linda',"Abhi","Ashi"], 'Age': [28, 35, 25, 41,35,28], 'City': ['New York', 'Paris', 'Paris', 'New York',"India","India"]} df = pd.DataFrame(data) df
# Creating DataFrame 1 data1 = {'Name': ['John', 'Alice', 'Bob', 'Emma',"Ashi"], 'Age': [25, 30, 35, 28,32], 'Department': ['HR', 'Finance', 'IT', 'IT',"Analytics"]} df1 = pd.DataFrame(data1) df1
# Creating DataFrame 2 data2 = {'Name': ['John', 'Alice', 'Bob', 'Emma'], 'Salary': [50000, 60000, 55000, 58000], 'Experience': [3, 5, 8, 4]} df2 = pd.DataFrame(data2) df2
# 1. head() print("1.Rows from top") df1.head(2)
1.Rows from top
# 2. tail() print(" Rows from bottom") df2.tail(2)
Rows from bottom
# 3. info() print("\n3. Info of DataFrame 1:") df1.info()
3. Info of DataFrame 1: <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Name 5 non-null object 1 Age 5 non-null int64 2 Department 5 non-null object dtypes: int64(1), object(2) memory usage: 248.0+ bytes
df1
df1.describe()
print("\n4. Description of DataFrame 2:") df1.describe(include = "object")
4. Description of DataFrame 2:
# 5. shape print("\n 5. Shape of DataFrame 1:") df1.shape
5. Shape of DataFrame 1:
(5, 3)
# 6. columns print("\n6. Columns of DataFrame 2:") df2.columns
6. Columns of DataFrame 2:
Index(['Name', 'Salary', 'Experience'], dtype='object')
df1
df1.loc[:,'Age'] df1.iloc[:,1] df1.Age type(df1) type(df1.Age)
pandas.core.series.Series
# 8. loc[] print("\n8. Selecting row labeled 'Bob' in DataFrame 2:") print(df2.loc[df2['Name'] == 'Bob'])
8. Selecting row labeled 'Bob' in DataFrame 2: Name Salary Experience 2 Bob 55000 8
# 9. drop() print("\n9. Dropping 'Department' column from DataFrame 1:") df1.drop(columns=['Department'], inplace=True) print(df1)
9. Dropping 'Department' column from DataFrame 1: Name Age 0 John 25 1 Alice 30 2 Bob 35 3 Emma 28 4 Ashi 32
df1
df2
df3=pd.merge(df1,df2,on="Name",how="outer") #right_on,left_on df3
#dropna #fillna #isnull().sum() df3.isnull().sum()
Name 0 Age 0 Salary 1 Experience 1 dtype: int64
m=df3["Salary"].mean()
# 10. fillna() print("\n10. Filling missing values in DataFrame 2 with 0:") df3["Salary"]=df3["Salary"].fillna(m) df3
10. Filling missing values in DataFrame 2 with 0:
print("\n11. Grouping DataFrame 3 by 'Department':") grouped = df3.groupby('Department') for department, group in grouped: print(f"\nDepartment: {department}") print(group)
# Example 1: Basic Arithmetic Operation add = lambda x, y,z: x + y+z result = add(5, 3,2) result
# Example 2: Sorting a List of Tuples students = [('Alice', 20), ('Bob', 18), ('Charlie', 22)] students.sort(key=lambda x: x[0]) students
# Example 3: Filtering Even Numbers from a List numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] even_numbers = list(filter(lambda x: x % 2 == 0, numbers)) print(even_numbers)
df3
# Applying a lambda function to create a new column df3['Salary_Adjusted'] = df3['Salary'].apply(lambda x: x * 2.1) # Increase salary by 20% df3['Salary_Adjusted']
# 12. apply() print("\n12. Applying function to 'Age' column in DataFrame 1:") df1['Age'] = df1['Age'].apply(lambda x: x + 1) print(df1) # 14. to_csv() print("\n14. Writing DataFrame 1 to CSV file:") df1.to_csv('df1_output.csv', index=False) # 15. to_excel() print("\n15. Writing DataFrame 2 to Excel file:") df2.to_excel('df2_output.xlsx', index=False)
##Lab Here are some lab questions based on the provided DataFrame examples: 1. **Basic DataFrame Operations:** - How many rows and columns are there in each DataFrame? - What are the data types of columns in each DataFrame? - Display the first three rows of each DataFrame. - Display the last two rows of each DataFrame. 2. **Data Manipulation:** - Add a new column 'Gender' to DataFrame 1 with values 'Male' for all rows. - Update the 'Salary' column in DataFrame 2 to increase each salary by 10%. - Rename the 'Experience' column in DataFrame 2 to 'Years_Exp'. - Remove the 'Department' column from DataFrame 1. - Fill any missing values in DataFrame 2 with the mean of the respective columns. 3. **Data Analysis:** - Calculate the average age in DataFrame 1. - Find the maximum salary in DataFrame 2. - Group DataFrame 1 by 'Age' and calculate the average salary for each age group. - Identify the employee with the highest salary in DataFrame 2. - Determine the department with the most employees in DataFrame 1. 4. **Data Merging:** - Merge DataFrame 1 and DataFrame 2 based on the 'Name' column. How many rows are there in the merged DataFrame? - Perform an inner join between DataFrame 1 and DataFrame 2. Are there any missing values in the merged DataFrame? - Merge DataFrame 1 and DataFrame 2 using a left join. Which DataFrame serves as the left DataFrame, and why? - Merge DataFrame 1 and DataFrame 2 using an outer join. Are there any rows with missing values after the outer join? 5. **Data Export:** - Save DataFrame 1 to a CSV file named 'df1_output.csv'. - Save DataFrame 2 to an Excel file named 'df2_output.xlsx'.

Partcipants with Age Greater than 30

# Filter data based on a condition filtered_df = df[df['Age'] > 30] filtered_df
# Sort DataFrame by Age sorted_df = df.sort_values(by='Age',ascending=False) sorted_df
# Group data by City and calculate average age grouped_df = df.groupby('City').mean() grouped_df