Sharedpandas.ipynbOpen in CoCalc
Pandas_mtm
import pandas as pd
reviews = pd.read_csv("https://raw.githubusercontent.com/ra314ra/ml/master/ign.csv")
reviews.head()
Unnamed: 0 score_phrase title url platform score genre editors_choice release_year release_month release_day
0 0 Amazing LittleBigPlanet PS Vita /games/littlebigplanet-vita/vita-98907 PlayStation Vita 9.0 Platformer Y 2012 9 12
1 1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E... /games/littlebigplanet-ps-vita-marvel-super-he... PlayStation Vita 9.0 Platformer Y 2012 9 12
2 2 Great Splice: Tree of Life /games/splice/ipad-141070 iPad 8.5 Puzzle N 2012 9 12
3 3 Great NHL 13 /games/nhl-13/xbox-360-128182 Xbox 360 8.5 Sports N 2012 9 11
4 4 Great NHL 13 /games/nhl-13/ps3-128181 PlayStation 3 8.5 Sports N 2012 9 11
reviews.iloc[0:5, :]
Unnamed: 0 score_phrase title url platform score genre editors_choice release_year release_month release_day
0 0 Amazing LittleBigPlanet PS Vita /games/littlebigplanet-vita/vita-98907 PlayStation Vita 9.0 Platformer Y 2012 9 12
1 1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E... /games/littlebigplanet-ps-vita-marvel-super-he... PlayStation Vita 9.0 Platformer Y 2012 9 12
2 2 Great Splice: Tree of Life /games/splice/ipad-141070 iPad 8.5 Puzzle N 2012 9 12
3 3 Great NHL 13 /games/nhl-13/xbox-360-128182 Xbox 360 8.5 Sports N 2012 9 11
4 4 Great NHL 13 /games/nhl-13/ps3-128181 PlayStation 3 8.5 Sports N 2012 9 11
reviews = reviews.iloc[:, 1:]
reviews.head()
score_phrase title url platform score genre editors_choice release_year release_month release_day
0 Amazing LittleBigPlanet PS Vita /games/littlebigplanet-vita/vita-98907 PlayStation Vita 9.0 Platformer Y 2012 9 12
1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E... /games/littlebigplanet-ps-vita-marvel-super-he... PlayStation Vita 9.0 Platformer Y 2012 9 12
2 Great Splice: Tree of Life /games/splice/ipad-141070 iPad 8.5 Puzzle N 2012 9 12
3 Great NHL 13 /games/nhl-13/xbox-360-128182 Xbox 360 8.5 Sports N 2012 9 11
4 Great NHL 13 /games/nhl-13/ps3-128181 PlayStation 3 8.5 Sports N 2012 9 11
reviews.shape
(18625, 10)
test_reviews_part1 = reviews.iloc[:,0:2]
test_reviews_part2 = reviews.iloc[:,3:]
test_reviews_dropped_column = pd.concat([test_reviews_part1, test_reviews_part2], axis=1)
test_reviews_part1.head()

score_phrase title
0 Amazing LittleBigPlanet PS Vita
1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E...
2 Great Splice: Tree of Life
3 Great NHL 13
4 Great NHL 13
test_reviews_part2.head()
platform score genre editors_choice release_year release_month release_day
0 PlayStation Vita 9.0 Platformer Y 2012 9 12
1 PlayStation Vita 9.0 Platformer Y 2012 9 12
2 iPad 8.5 Puzzle N 2012 9 12
3 Xbox 360 8.5 Sports N 2012 9 11
4 PlayStation 3 8.5 Sports N 2012 9 11
test_reviews_part2.head()
platform score genre editors_choice release_year release_month release_day
0 PlayStation Vita 9.0 Platformer Y 2012 9 12
1 PlayStation Vita 9.0 Platformer Y 2012 9 12
2 iPad 8.5 Puzzle N 2012 9 12
3 Xbox 360 8.5 Sports N 2012 9 11
4 PlayStation 3 8.5 Sports N 2012 9 11
test_drop_column = reviews.drop(['url'], axis=1)
test_drop_column.head()
score_phrase title platform score genre editors_choice release_year release_month release_day
0 Amazing LittleBigPlanet PS Vita PlayStation Vita 9.0 Platformer Y 2012 9 12
1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E... PlayStation Vita 9.0 Platformer Y 2012 9 12
2 Great Splice: Tree of Life iPad 8.5 Puzzle N 2012 9 12
3 Great NHL 13 Xbox 360 8.5 Sports N 2012 9 11
4 Great NHL 13 PlayStation 3 8.5 Sports N 2012 9 11
some_reviews = reviews.loc[10:20,]
some_reviews.head()
score_phrase title url platform score genre editors_choice release_year release_month release_day
10 Good Tekken Tag Tournament 2 /games/tekken-tag-tournament-2/ps3-124584 PlayStation 3 7.5 Fighting N 2012 9 11
11 Good Tekken Tag Tournament 2 /games/tekken-tag-tournament-2/xbox-360-124581 Xbox 360 7.5 Fighting N 2012 9 11
12 Good Wild Blood /games/wild-blood/iphone-139363 iPhone 7.0 NaN N 2012 9 10
13 Amazing Mark of the Ninja /games/mark-of-the-ninja-135615/xbox-360-129276 Xbox 360 9.0 Action, Adventure Y 2012 9 7
14 Amazing Mark of the Ninja /games/mark-of-the-ninja-135615/pc-143761 PC 9.0 Action, Adventure Y 2012 9 7
some_reviews.loc[18:24,:]
score_phrase title url platform score genre editors_choice release_year release_month release_day
18 Mediocre Way of the Samurai 4 /games/way-of-the-samurai-4/ps3-23516 PlayStation 3 5.5 Action, Adventure N 2012 9 3
19 Good JoJo's Bizarre Adventure HD /games/jojos-bizarre-adventure/xbox-360-137717 Xbox 360 7.0 Fighting N 2012 9 3
20 Good JoJo's Bizarre Adventure HD /games/jojos-bizarre-adventure/ps3-137896 PlayStation 3 7.0 Fighting N 2012 9 3
reviews.loc[:5, "score"]
0 9.0 1 9.0 2 8.5 3 8.5 4 8.5 5 7.0 Name: score, dtype: float64
reviews.loc[:5, ["score", "release_year"]]
score release_year
0 9.0 2012
1 9.0 2012
2 8.5 2012
3 8.5 2012
4 8.5 2012
5 7.0 2012
type(reviews.loc[:5, ["score", "release_year"]])
pandas.core.frame.DataFrame
reviews[["score", "editors_choice"]]
score editors_choice
0 9.0 Y
1 9.0 Y
2 8.5 N
3 8.5 N
4 8.5 N
5 7.0 N
6 3.0 N
7 9.0 Y
8 3.0 N
9 7.0 N
10 7.5 N
11 7.5 N
12 7.0 N
13 9.0 Y
14 9.0 Y
15 6.5 N
16 6.5 N
17 8.0 N
18 5.5 N
19 7.0 N
20 7.0 N
21 7.5 N
22 7.5 N
23 7.5 N
24 9.0 Y
25 7.0 N
26 9.0 Y
27 7.5 N
28 8.0 N
29 6.5 N
... ... ...
18595 4.4 N
18596 6.5 N
18597 4.9 N
18598 6.8 N
18599 7.0 N
18600 7.4 N
18601 7.4 N
18602 7.4 N
18603 7.8 N
18604 8.6 N
18605 6.0 N
18606 6.4 N
18607 7.0 N
18608 5.4 N
18609 8.0 N
18610 6.0 N
18611 5.8 N
18612 7.8 N
18613 8.0 N
18614 9.2 Y
18615 9.2 Y
18616 7.5 N
18617 8.4 N
18618 9.1 Y
18619 7.9 N
18620 7.6 N
18621 9.0 Y
18622 5.8 N
18623 10.0 Y
18624 10.0 Y

18625 rows × 2 columns

reviews["score"].mean()
6.950459060402666
reviews.mean(axis=1)
0 510.500 1 510.500 2 510.375 3 510.125 4 510.125 5 509.750 6 508.750 7 510.250 8 508.750 9 509.750 10 509.875 11 509.875 12 509.500 13 509.250 14 509.250 15 508.375 16 508.375 17 508.500 18 507.375 19 507.750 20 507.750 21 514.625 22 514.625 23 514.625 24 515.000 25 514.250 26 514.750 27 514.125 28 514.250 29 513.625 ... 18595 510.850 18596 510.875 18597 510.225 18598 510.700 18599 510.750 18600 512.600 18601 512.600 18602 512.600 18603 512.450 18604 512.400 18605 511.500 18606 508.600 18607 510.750 18608 510.350 18609 510.750 18610 510.250 18611 508.700 18612 509.200 18613 508.000 18614 515.050 18615 515.050 18616 508.375 18617 508.600 18618 515.025 18619 514.725 18620 514.650 18621 515.000 18622 513.950 18623 515.000 18624 515.000 Length: 18625, dtype: float64
reviews.corr()
score release_year release_month release_day
score 1.000000 0.062716 0.007632 0.020079
release_year 0.062716 1.000000 -0.115515 0.016867
release_month 0.007632 -0.115515 1.000000 -0.067964
release_day 0.020079 0.016867 -0.067964 1.000000
reviews["score"] /2

0 4.50 1 4.50 2 4.25 3 4.25 4 4.25 5 3.50 6 1.50 7 4.50 8 1.50 9 3.50 10 3.75 11 3.75 12 3.50 13 4.50 14 4.50 15 3.25 16 3.25 17 4.00 18 2.75 19 3.50 20 3.50 21 3.75 22 3.75 23 3.75 24 4.50 25 3.50 26 4.50 27 3.75 28 4.00 29 3.25 ... 18595 2.20 18596 3.25 18597 2.45 18598 3.40 18599 3.50 18600 3.70 18601 3.70 18602 3.70 18603 3.90 18604 4.30 18605 3.00 18606 3.20 18607 3.50 18608 2.70 18609 4.00 18610 3.00 18611 2.90 18612 3.90 18613 4.00 18614 4.60 18615 4.60 18616 3.75 18617 4.20 18618 4.55 18619 3.95 18620 3.80 18621 4.50 18622 2.90 18623 5.00 18624 5.00 Name: score, Length: 18625, dtype: float64
score_filter = reviews["score"] > 7
score_filter
0 True 1 True 2 True 3 True 4 True 5 False 6 False 7 True 8 False 9 False 10 True 11 True 12 False 13 True 14 True 15 False 16 False 17 True 18 False 19 False 20 False 21 True 22 True 23 True 24 True 25 False 26 True 27 True 28 True 29 False ... 18595 False 18596 False 18597 False 18598 False 18599 False 18600 True 18601 True 18602 True 18603 True 18604 True 18605 False 18606 False 18607 False 18608 False 18609 True 18610 False 18611 False 18612 True 18613 True 18614 True 18615 True 18616 True 18617 True 18618 True 18619 True 18620 True 18621 True 18622 False 18623 True 18624 True Name: score, Length: 18625, dtype: bool
filtered_reviews = reviews.loc[score_filter].copy()
filtered_reviews.head()
score_phrase title url platform score genre editors_choice release_year release_month release_day
0 Amazing LittleBigPlanet PS Vita /games/littlebigplanet-vita/vita-98907 PlayStation Vita 9.0 Platformer Y 2012 9 12
1 Amazing LittleBigPlanet PS Vita -- Marvel Super Hero E... /games/littlebigplanet-ps-vita-marvel-super-he... PlayStation Vita 9.0 Platformer Y 2012 9 12
2 Great Splice: Tree of Life /games/splice/ipad-141070 iPad 8.5 Puzzle N 2012 9 12
3 Great NHL 13 /games/nhl-13/xbox-360-128182 Xbox 360 8.5 Sports N 2012 9 11
4 Great NHL 13 /games/nhl-13/ps3-128181 PlayStation 3 8.5 Sports N 2012 9 11
reviews["score"].divide(reviews["score"])
0 1.0 1 1.0 2 1.0 3 1.0 4 1.0 5 1.0 6 1.0 7 1.0 8 1.0 9 1.0 10 1.0 11 1.0 12 1.0 13 1.0 14 1.0 15 1.0 16 1.0 17 1.0 18 1.0 19 1.0 20 1.0 21 1.0 22 1.0 23 1.0 24 1.0 25 1.0 26 1.0 27 1.0 28 1.0 29 1.0 ... 18595 1.0 18596 1.0 18597 1.0 18598 1.0 18599 1.0 18600 1.0 18601 1.0 18602 1.0 18603 1.0 18604 1.0 18605 1.0 18606 1.0 18607 1.0 18608 1.0 18609 1.0 18610 1.0 18611 1.0 18612 1.0 18613 1.0 18614 1.0 18615 1.0 18616 1.0 18617 1.0 18618 1.0 18619 1.0 18620 1.0 18621 1.0 18622 1.0 18623 1.0 18624 1.0 Name: score, Length: 18625, dtype: float64
xbox_one_filter = (reviews["score"] > 7) & (reviews["platform"] == "Xbox One")
filtered_reviews = reviews[xbox_one_filter]
filtered_reviews.head()
score_phrase title url platform score genre editors_choice release_year release_month release_day
17137 Amazing Gone Home /games/gone-home/xbox-one-20014361 Xbox One 9.5 Simulation Y 2013 8 15
17197 Amazing Rayman Legends /games/rayman-legends/xbox-one-20008449 Xbox One 9.5 Platformer Y 2013 8 26
17295 Amazing LEGO Marvel Super Heroes /games/lego-marvel-super-heroes/xbox-one-20000826 Xbox One 9.0 Action Y 2013 10 22
17313 Great Dead Rising 3 /games/dead-rising-3/xbox-one-124306 Xbox One 8.3 Action N 2013 11 18
17317 Great Killer Instinct /games/killer-instinct-2013/xbox-one-20000538 Xbox One 8.4 Fighting N 2013 11 18
%matplotlib inline
reviews[reviews["platform"] == "Xbox One"]["score"].plot(kind="hist")
<matplotlib.axes._subplots.AxesSubplot at 0x7f80f7b5e3c8>
reviews[reviews["platform"] == "PlayStation 4"]["score"].plot(kind="hist")
<matplotlib.axes._subplots.AxesSubplot at 0x7f80f7b710f0>
reviews.describe()
score release_year release_month release_day
count 18625.000000 18625.000000 18625.00000 18625.000000
mean 6.950459 2006.515329 7.13847 15.603866
std 1.711736 4.587529 3.47671 8.690128
min 0.500000 1970.000000 1.00000 1.000000
25% 6.000000 2003.000000 4.00000 8.000000
50% 7.300000 2007.000000 8.00000 16.000000
75% 8.200000 2010.000000 10.00000 23.000000
max 10.000000 2016.000000 12.00000 31.000000
data = pd.read_csv("https://raw.githubusercontent.com/ra314ra/ml/master/thanksgiving-2015-poll-data.csv", encoding = 'latin-1')
data.head()
RespondentID Do you celebrate Thanksgiving? What is typically the main dish at your Thanksgiving dinner? What is typically the main dish at your Thanksgiving dinner? - Other (please specify) How is the main dish typically cooked? How is the main dish typically cooked? - Other (please specify) What kind of stuffing/dressing do you typically have? What kind of stuffing/dressing do you typically have? - Other (please specify) What type of cranberry saucedo you typically have? What type of cranberry saucedo you typically have? - Other (please specify) ... Have you ever tried to meet up with hometown friends on Thanksgiving night? Have you ever attended a "Friendsgiving?" Will you shop any Black Friday sales on Thanksgiving Day? Do you work in retail? Will you employer make you work on Black Friday? How would you describe where you live? Age What is your gender? How much total combined money did all members of your HOUSEHOLD earn last year? US Region
0 4337954960 Yes Turkey NaN Baked NaN Bread-based NaN None NaN ... Yes No No No NaN Suburban 18 - 29 Male 75,000to75,000 to 99,999 Middle Atlantic
1 4337951949 Yes Turkey NaN Baked NaN Bread-based NaN Other (please specify) Homemade cranberry gelatin ring ... No No Yes No NaN Rural 18 - 29 Female 50,000to50,000 to 74,999 East South Central
2 4337935621 Yes Turkey NaN Roasted NaN Rice-based NaN Homemade NaN ... Yes Yes Yes No NaN Suburban 18 - 29 Male 0to0 to 9,999 Mountain
3 4337933040 Yes Turkey NaN Baked NaN Bread-based NaN Homemade NaN ... Yes No No No NaN Urban 30 - 44 Male $200,000 and up Pacific
4 4337931983 Yes Tofurkey NaN Baked NaN Bread-based NaN Canned NaN ... Yes No No No NaN Urban 30 - 44 Male 100,000to100,000 to 124,999 Pacific

5 rows × 65 columns

data.shape
(1058, 65)
data["Do you celebrate Thanksgiving?"].unique()
array(['Yes', 'No'], dtype=object)
data.columns[50:]
Index(['Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply. - Other (please specify).1', 'Do you typically pray before or after the Thanksgiving meal?', 'How far will you travel for Thanksgiving?', 'Will you watch any of the following programs on Thanksgiving? Please select all that apply. - Macy's Parade', 'What's the age cutoff at your "kids' table" at Thanksgiving?', 'Have you ever tried to meet up with hometown friends on Thanksgiving night?', 'Have you ever attended a "Friendsgiving?"', 'Will you shop any Black Friday sales on Thanksgiving Day?', 'Do you work in retail?', 'Will you employer make you work on Black Friday?', 'How would you describe where you live?', 'Age', 'What is your gender?', 'How much total combined money did all members of your HOUSEHOLD earn last year?', 'US Region'], dtype='object')
data["What is your gender?"].value_counts(dropna=False)
Female 544 Male 481 NaN 33 Name: What is your gender?, dtype: int64
import math

def gender_code(gender_string):
    if isinstance(gender_string, float) and math.isnan(gender_string):
        return gender_string
    return int(gender_string == "Female")
data["gender"] = data["What is your gender?"].apply(gender_code)
data["gender"].value_counts(dropna=False)
1.0 544 0.0 481 NaN 33 Name: gender, dtype: int64
data.apply(lambda x: x.dtype).head()
RespondentID object Do you celebrate Thanksgiving? object What is typically the main dish at your Thanksgiving dinner? object What is typically the main dish at your Thanksgiving dinner? - Other (please specify) object How is the main dish typically cooked? object dtype: object
data["How much total combined money did all members of your HOUSEHOLD earn last year?"].value_counts(dropna=False)
$25,000 to $49,999 180 Prefer not to answer 136 $50,000 to $74,999 135 $75,000 to $99,999 133 $100,000 to $124,999 111 $200,000 and up 80 $10,000 to $24,999 68 $0 to $9,999 66 $125,000 to $149,999 49 $150,000 to $174,999 40 NaN 33 $175,000 to $199,999 27 Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64
import numpy as np

def clean_income(value):
    if value == "$200,000 and up":
        return 200000
    elif value == "Prefer not to answer":
        return np.nan
    elif isinstance(value, float) and math.isnan(value):
        return np.nan
    value = value.replace(",", "").replace("$", "")
    income_high, income_low = value.split(" to ")
    return (int(income_high) + int(income_low)) / 2
data["income"] = data["How much total combined money did all members of your HOUSEHOLD earn last year?"].apply(clean_income)
data["income"].head()
0 87499.5 1 62499.5 2 4999.5 3 200000.0 4 112499.5 Name: income, dtype: float64
data["income"].value_counts(dropna=False)
37499.5 180 NaN 169 62499.5 135 87499.5 133 112499.5 111 200000.0 80 17499.5 68 4999.5 66 137499.5 49 162499.5 40 187499.5 27 Name: income, dtype: int64
data["What type of cranberry saucedo you typically have?"].value_counts()
Canned 502 Homemade 301 None 146 Other (please specify) 25 Name: What type of cranberry saucedo you typically have?, dtype: int64
homemade = data[data["What type of cranberry saucedo you typically have?"] == "Homemade"]
canned = data[data["What type of cranberry saucedo you typically have?"] == "Canned"]
print(homemade["income"].mean())
print(canned["income"].mean())
94878.1072874494 83823.40340909091
grouped = data.groupby("What type of cranberry saucedo you typically have?")
grouped
<pandas.core.groupby.DataFrameGroupBy object at 0x7f80f678bd30>
grouped.groups
{'Canned': Int64Index([ 4, 6, 8, 11, 12, 15, 18, 19, 26, 27, ... 1040, 1041, 1042, 1044, 1045, 1046, 1047, 1051, 1054, 1057], dtype='int64', length=502), 'Homemade': Int64Index([ 2, 3, 5, 7, 13, 14, 16, 20, 21, 23, ... 1016, 1017, 1025, 1027, 1030, 1034, 1048, 1049, 1053, 1056], dtype='int64', length=301), 'None': Int64Index([ 0, 17, 24, 29, 34, 36, 40, 47, 49, 51, ... 980, 981, 997, 1015, 1018, 1031, 1037, 1043, 1050, 1055], dtype='int64', length=146), 'Other (please specify)': Int64Index([ 1, 9, 154, 216, 221, 233, 249, 265, 301, 336, 380, 435, 444, 447, 513, 550, 749, 750, 784, 807, 860, 872, 905, 1000, 1007], dtype='int64')}
grouped.size()
What type of cranberry saucedo you typically have? Canned 502 Homemade 301 None 146 Other (please specify) 25 dtype: int64
grouped["income"].agg(np.mean)
What type of cranberry saucedo you typically have? Canned 83823.403409 Homemade 94878.107287 None 78886.084034 Other (please specify) 86629.978261 Name: income, dtype: float64
grouped.agg(np.mean)
RespondentID gender income
What type of cranberry saucedo you typically have?
Canned 4.336699e+09 0.552846 83823.403409
Homemade 4.336792e+09 0.533101 94878.107287
None 4.336765e+09 0.517483 78886.084034
Other (please specify) 4.336763e+09 0.640000 86629.978261
%matplotlib inline

sauce = grouped.agg(np.mean)
sauce["income"].plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x7f80f6713630>
grouped = data.groupby(["What type of cranberry saucedo you typically have?", "What is typically the main dish at your Thanksgiving dinner?"])
grouped.agg(np.mean)
RespondentID gender income
What type of cranberry saucedo you typically have? What is typically the main dish at your Thanksgiving dinner?
Canned Chicken 4.336354e+09 0.333333 80999.600000
Ham/Pork 4.336757e+09 0.642857 77499.535714
I don't know 4.335987e+09 0.000000 4999.500000
Other (please specify) 4.336682e+09 1.000000 53213.785714
Roast beef 4.336254e+09 0.571429 25499.500000
Tofurkey 4.337157e+09 0.714286 100713.857143
Turkey 4.336705e+09 0.544444 85242.682045
Homemade Chicken 4.336540e+09 0.750000 19999.500000
Ham/Pork 4.337253e+09 0.250000 96874.625000
I don't know 4.336084e+09 1.000000 NaN
Other (please specify) 4.336863e+09 0.600000 55356.642857
Roast beef 4.336174e+09 0.000000 33749.500000
Tofurkey 4.336790e+09 0.666667 57916.166667
Turducken 4.337475e+09 0.500000 200000.000000
Turkey 4.336791e+09 0.531008 97690.147982
None Chicken 4.336151e+09 0.500000 11249.500000
Ham/Pork 4.336680e+09 0.444444 61249.500000
I don't know 4.336412e+09 0.500000 33749.500000
Other (please specify) 4.336688e+09 0.600000 119106.678571
Roast beef 4.337424e+09 0.000000 162499.500000
Tofurkey 4.336950e+09 0.500000 112499.500000
Turducken 4.336739e+09 0.000000 NaN
Turkey 4.336784e+09 0.523364 74606.275281
Other (please specify) Ham/Pork 4.336465e+09 1.000000 87499.500000
Other (please specify) 4.337335e+09 0.000000 124999.666667
Tofurkey 4.336122e+09 1.000000 37499.500000
Turkey 4.336724e+09 0.700000 82916.194444
grouped["income"].agg([np.mean, np.sum, np.std]).head(10)
mean sum std
What type of cranberry saucedo you typically have? What is typically the main dish at your Thanksgiving dinner?
Canned Chicken 80999.600000 404998.0 75779.481062
Ham/Pork 77499.535714 1084993.5 56645.063944
I don't know 4999.500000 4999.5 NaN
Other (please specify) 53213.785714 372496.5 29780.946290
Roast beef 25499.500000 127497.5 24584.039538
Tofurkey 100713.857143 704997.0 61351.484439
Turkey 85242.682045 34182315.5 55687.436102
Homemade Chicken 19999.500000 59998.5 16393.596311
Ham/Pork 96874.625000 387498.5 77308.452805
I don't know NaN 0.0 NaN
grouped.size()
What type of cranberry saucedo you typically have? What is typically the main dish at your Thanksgiving dinner? Canned Chicken 6 Ham/Pork 15 I don't know 2 Other (please specify) 7 Roast beef 7 Tofurkey 7 Turkey 458 Homemade Chicken 4 Ham/Pork 4 I don't know 1 Other (please specify) 10 Roast beef 3 Tofurkey 6 Turducken 2 Turkey 271 None Chicken 2 Ham/Pork 9 I don't know 2 Other (please specify) 15 Roast beef 1 Tofurkey 6 Turducken 1 Turkey 110 Other (please specify) Ham/Pork 1 Other (please specify) 3 Tofurkey 1 Turkey 20 dtype: int64
grouped = data.groupby("How would you describe where you live?")["What is typically the main dish at your Thanksgiving dinner?"]
grouped.size()
How would you describe where you live? Rural 216 Suburban 496 Urban 236 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64
grouped.apply(lambda x:x.value_counts())
How would you describe where you live? Rural Turkey 189 Other (please specify) 9 Ham/Pork 7 I don't know 3 Tofurkey 3 Turducken 2 Chicken 2 Roast beef 1 Suburban Turkey 449 Ham/Pork 17 Other (please specify) 13 Tofurkey 9 Roast beef 3 Chicken 3 Turducken 1 I don't know 1 Urban Turkey 198 Other (please specify) 13 Tofurkey 8 Chicken 7 Roast beef 6 Ham/Pork 4 Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64