CoCalc -- Day1 part 2 Data Manipulation using Pandas.ipynb

GitHub Repository: suyashi29/python-su
Path: blob/master/Generative AI for Intelligent Data Handling/Day1 part 2 Data Manipulation using Pandas.ipynb
³⁰⁷⁴ views

Kernel: Python 3 (ipykernel)

Data Manipulation using Pandas

Series: 1D () , 1,2,3
DataFrame: 2D(), R*C [[1,2]],[[2,2]]
Panels: 3D(), ARC , [[[1,2]],[[3,2,]],[[2,3]]

In [5]:

t=(1,2,3)
t
l=[1,2,3]
l
l[0]

Out[5]:

1

In [9]:

import pandas as pd
p=pd.Series((1,2,3),index=("one","two","three"))
p

Out[9]:

one      1
two      2
three    3
dtype: int64

In [13]:

d={}
d[1]="Red"
d[2]="Pink"
type(d)
d
s=pd.Series(d)
s

Out[13]:

1     Red
2    Pink
dtype: object

In [14]:

import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda',"Abhi","Ashi"],
        'Age': [28, 35, 25, 41,35,28],
        'City': ['New York', 'Paris', 'Paris', 'New York',"India","India"]}

df = pd.DataFrame(data)
df

Out[14]:

In [15]:

# Creating DataFrame 1
data1 = {'Name': ['John', 'Alice', 'Bob', 'Emma',"Ashi"],
         'Age': [25, 30, 35, 28,32],
         'Department': ['HR', 'Finance', 'IT', 'IT',"Analytics"]}
df1 = pd.DataFrame(data1)
df1

Out[15]:

In [16]:

# Creating DataFrame 2
data2 = {'Name': ['John', 'Alice', 'Bob', 'Emma'],
         'Salary': [50000, 60000, 55000, 58000],
         'Experience': [3, 5, 8, 4]}
df2 = pd.DataFrame(data2)
df2

Out[16]:

In [17]:

# 1. head()
print("1.Rows from top")
df1.head(2)

Out[17]:

1.Rows from top

In [18]:

# 2. tail()
print(" Rows from bottom")
df2.tail(2)

Out[18]:

 Rows from bottom

In [19]:

# 3. info()
print("\n3. Info of DataFrame 1:")
df1.info()

Out[19]:

3. Info of DataFrame 1:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        5 non-null      object
 1   Age         5 non-null      int64 
 2   Department  5 non-null      object
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes

In [20]:

df1

Out[20]:

In [25]:

 df1.describe()

Out[25]:

In [24]:

print("\n4. Description of DataFrame 2:")
df1.describe(include = "object")

Out[24]:

4. Description of DataFrame 2:

In [27]:

# 5. shape
print("\n 5. Shape of DataFrame 1:")
df1.shape

Out[27]:

 5. Shape of DataFrame 1:

(5, 3)

In [28]:

# 6. columns
print("\n6. Columns of DataFrame 2:")
df2.columns

Out[28]:

6. Columns of DataFrame 2:

Index(['Name', 'Salary', 'Experience'], dtype='object')

In [29]:

df1

Out[29]:

In [35]:

df1.loc[:,'Age']
df1.iloc[:,1]
df1.Age
type(df1)
type(df1.Age)

Out[35]:

pandas.core.series.Series

In [36]:

# 8. loc[]
print("\n8. Selecting row labeled 'Bob' in DataFrame 2:")
print(df2.loc[df2['Name'] == 'Bob'])

Out[36]:

8. Selecting row labeled 'Bob' in DataFrame 2:
  Name  Salary  Experience
2  Bob   55000           8

In [37]:

# 9. drop()
print("\n9. Dropping 'Department' column from DataFrame 1:")
df1.drop(columns=['Department'], inplace=True)
print(df1)

Out[37]:

Dropping 'Department' column from DataFrame 1:
    Name  Age
 John   25
Alice   30
  Bob   35
 Emma   28
 Ashi   32

In [38]:

df1

Out[38]:

In [39]:

df2

Out[39]:

In [56]:

df3=pd.merge(df1,df2,on="Name",how="outer") #right_on,left_on
df3

Out[56]:

In [57]:

#dropna
#fillna
#isnull().sum()
df3.isnull().sum()

Out[57]:

Name          0
Age           0
Salary        1
Experience    1
dtype: int64

In [60]:

m=df3["Salary"].mean()

In [62]:

# 10. fillna()
print("\n10. Filling missing values in DataFrame 2 with 0:")
df3["Salary"]=df3["Salary"].fillna(m)
df3

Out[62]:

10. Filling missing values in DataFrame 2 with 0:

In [ ]:

print("\n11. Grouping DataFrame 3 by 'Department':")
grouped = df3.groupby('Department')
for department, group in grouped:
    print(f"\nDepartment: {department}")
    print(group)

In [ ]:

# Example 1: Basic Arithmetic Operation
add = lambda x, y,z: x + y+z
result = add(5, 3,2)
result

In [ ]:

# Example 2: Sorting a List of Tuples
students = [('Alice', 20), ('Bob', 18), ('Charlie', 22)]
students.sort(key=lambda x: x[0])
students

In [ ]:

# Example 3: Filtering Even Numbers from a List
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print(even_numbers)

In [ ]:

df3

In [ ]:

# Applying a lambda function to create a new column
df3['Salary_Adjusted'] = df3['Salary'].apply(lambda x: x * 2.1)  # Increase salary by 20%
df3['Salary_Adjusted']

In [ ]:


# 12. apply()
print("\n12. Applying function to 'Age' column in DataFrame 1:")
df1['Age'] = df1['Age'].apply(lambda x: x + 1)
print(df1)


# 14. to_csv()
print("\n14. Writing DataFrame 1 to CSV file:")
df1.to_csv('df1_output.csv', index=False)

# 15. to_excel()
print("\n15. Writing DataFrame 2 to Excel file:")
df2.to_excel('df2_output.xlsx', index=False)

##Lab 

Here are some lab questions based on the provided DataFrame examples:

1. **Basic DataFrame Operations:**
   - How many rows and columns are there in each DataFrame?
   - What are the data types of columns in each DataFrame?
   - Display the first three rows of each DataFrame.
   - Display the last two rows of each DataFrame.

2. **Data Manipulation:**
   - Add a new column 'Gender' to DataFrame 1 with values 'Male' for all rows.
   - Update the 'Salary' column in DataFrame 2 to increase each salary by 10%.
   - Rename the 'Experience' column in DataFrame 2 to 'Years_Exp'.
   - Remove the 'Department' column from DataFrame 1.
   - Fill any missing values in DataFrame 2 with the mean of the respective columns.

3. **Data Analysis:**
   - Calculate the average age in DataFrame 1.
   - Find the maximum salary in DataFrame 2.
   - Group DataFrame 1 by 'Age' and calculate the average salary for each age group.
   - Identify the employee with the highest salary in DataFrame 2.
   - Determine the department with the most employees in DataFrame 1.

4. **Data Merging:**
   - Merge DataFrame 1 and DataFrame 2 based on the 'Name' column. How many rows are there in the merged DataFrame?
   - Perform an inner join between DataFrame 1 and DataFrame 2. Are there any missing values in the merged DataFrame?
   - Merge DataFrame 1 and DataFrame 2 using a left join. Which DataFrame serves as the left DataFrame, and why?
   - Merge DataFrame 1 and DataFrame 2 using an outer join. Are there any rows with missing values after the outer join?

5. **Data Export:**
   - Save DataFrame 1 to a CSV file named 'df1_output.csv'.
   - Save DataFrame 2 to an Excel file named 'df2_output.xlsx'.

Partcipants with Age Greater than 30

In [ ]:

# Filter data based on a condition
filtered_df = df[df['Age'] > 30]
filtered_df

In [ ]:

# Sort DataFrame by Age
sorted_df = df.sort_values(by='Age',ascending=False)
sorted_df

In [ ]:

# Group data by City and calculate average age
grouped_df = df.groupby('City').mean()
grouped_df

Data Manipulation using Pandas

Partcipants with Age Greater than 30

Product

Resources

Company