Path: blob/master/Advanced Data Analysis using Python/4 Advanced Pandas Functions.ipynb
3074 views
Pandas Functions
groupby() Used to group data by specific columns and apply functions on them.
Lambda funtion:
Lambda functions are best suited for short, throwaway functions.
For complex operations, regular functions are preferred.
Lambda functions are anonymous functions defined using the 'lambda' keyword.
Syntax: lambda arguments: expression
rolling(): Provides rolling window calculations.
Case Study A: Sales Performance and Data Transformation in E-Commerce
Key Plots:
Sales and Revenue by Category: Bar plot representing total sales and average revenue by each category.
Average Revenue by Region and Product: Bar plot for average revenue based on region and product.
Original vs Doubled Revenue: A comparison of the original and doubled revenue.
Transformed Sales, Revenue, and Discount: A line plot showing the numeric transformations.
Sales and Rolling Mean Sales: A line plot showing sales alongside a rolling mean.
Sales and Shifted Sales: Line plot comparing original and shifted sales.
Sales by Category After Merge: Bar plot showing the merged sales data grouped by category.
Exploded Product Distribution: Bar plot showing the distribution of exploded product lists.
Original vs Updated Age: Bar plot comparing the original and updated age columns.
Product Count by Region: A stacked bar plot showing counts of each product per region.
Daily Sales and Revenue: Line plot showing daily aggregated sales and revenue.
Discount after FillNA: Line plot showing how missing discount values were filled.
Sales by Category with Reset Index: Bar plot showing sales by category after resetting the index.
Insights:
GroupBy gives aggregate metrics.
Pivot tables help in aggregating data based on multiple variables.
Apply and ApplyMap allow for custom transformations of entire columns or the whole DataFrame.
Rolling and Shift are great for calculating trends and previous values.
Merge, Explode, and Pipe allow for complex data transformations.
Cross-tab helps in comparing two categorical features.
Resampling and FillNA ensure handling time series data and missing values.