CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
jackfrued

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: jackfrued/Python-100-Days
Path: blob/master/公开课/文档/年薪50W+的Python程序员如何写代码/code/Python/使用Pandas做数据分析.ipynb
Views: 729
Kernel: Python 3
import pandas
frame = pandas.read_csv('USvideos.csv') frame
# 喜欢数5W以上的视频 frame[frame.likes>50000]
# 喜欢数5W以上的视频的总和 frame[frame.likes>50000].likes.sum()
2618681277
# 喜欢数Top10视频 frame.sort_values('likes', ascending=False)[:10]
frame2 = frame.drop(index=frame[frame.video_id.duplicated()].index) # 喜欢数Top10视频 frame2.sort_values('likes', ascending=False)[:10]
# 根据频道分组查看每个频道视频数量 frame.groupby('channel_title').size()
channel_title 12 News 2 1MILLION Dance Studio 33 1theK (원더케이) 19 20th Century Fox 135 2CELLOS 2 ... ワーナー ブラザース 公式チャンネル 6 圧倒的不審者の極み! 12 杰威爾音樂 JVR Music 29 郭韋辰 2 영국남자 Korean Englishman 6 Length: 2207, dtype: int64
# 找出视频数量最多的频道Top10 frame.groupby('channel_title').size().sort_values(ascending=False)[:10]
channel_title ESPN 203 The Tonight Show Starring Jimmy Fallon 197 Vox 193 TheEllenShow 193 Netflix 193 The Late Show with Stephen Colbert 187 Jimmy Kimmel Live 186 Late Night with Seth Meyers 183 Screen Junkies 182 NBA 181 dtype: int64
%matplotlib inline import matplotlib.pyplot as plt # 绘制饼图 top10 = frame.groupby('channel_title').size().sort_values(ascending=False)[:10] top10.plot.pie(figsize=(20, 10), autopct='%.2f%%') plt.title('Top 10 Channels')
Text(0.5, 1.0, 'Top 10 Channels')
Image in a Jupyter notebook
# 将Series转成DataFrame frame3 = frame.groupby('channel_title').size().sort_values(ascending=False)[:10].to_frame('video_count') frame3
# 将数据写入CVS文件 frame3.to_csv('result.csv')