Path: blob/main/Trabajo_grupal/WG8/Grupo_4_jupyter.ipynb
2714 views
Kernel: Python 3 (ipykernel)
In [11]:
Out[11]:
Tarea 8 - Grupo 4:
Gráficos en Jupyter Notebook : Encuesta de población - EE. UU. (2015)
Integrantes:
Luana Morales
Marcela Quintero
Seidy Ascencios
Flavia Oré
In [127]:
Cargamos la base de datos
In [128]:
Out[128]:
Requirement already satisfied: pyreadr in c:\users\marcela quintero\anaconda3\lib\site-packages (0.4.7)
Requirement already satisfied: pandas>=1.2.0 in c:\users\marcela quintero\anaconda3\lib\site-packages (from pyreadr) (1.4.2)
Requirement already satisfied: numpy>=1.18.5 in c:\users\marcela quintero\anaconda3\lib\site-packages (from pandas>=1.2.0->pyreadr) (1.21.5)
Requirement already satisfied: pytz>=2020.1 in c:\users\marcela quintero\anaconda3\lib\site-packages (from pandas>=1.2.0->pyreadr) (2021.3)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\marcela quintero\anaconda3\lib\site-packages (from pandas>=1.2.0->pyreadr) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\marcela quintero\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->pandas>=1.2.0->pyreadr) (1.16.0)
In [129]:
Out[129]:
OrderedDict([('data',
wage lwage sex shs hsg scl clg ad mw so we \
rownames
10 9.615385 2.263364 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
12 48.076923 3.872802 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
15 11.057692 2.403126 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
18 13.942308 2.634928 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
19 28.846154 3.361977 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ...
32620 14.769231 2.692546 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0
32624 23.076923 3.138833 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0
32626 38.461538 3.649659 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0
32631 32.967033 3.495508 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
32643 17.307692 2.851151 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0
ne exp1 exp2 exp3 exp4 occ occ2 ind ind2
rownames
10 1.0 7.0 0.49 0.343 0.2401 3600 11 8370 18
12 1.0 31.0 9.61 29.791 92.3521 3050 10 5070 9
15 1.0 18.0 3.24 5.832 10.4976 6260 19 770 4
18 1.0 25.0 6.25 15.625 39.0625 420 1 6990 12
19 1.0 22.0 4.84 10.648 23.4256 2015 6 9470 22
... ... ... ... ... ... ... ... ... ...
32620 0.0 9.0 0.81 0.729 0.6561 4700 16 4970 9
32624 0.0 12.0 1.44 1.728 2.0736 4110 13 8680 20
32626 0.0 11.0 1.21 1.331 1.4641 1550 4 3680 6
32631 0.0 10.0 1.00 1.000 1.0000 2920 9 6570 11
32643 0.0 14.0 1.96 2.744 3.8416 1610 5 7460 14
[5150 rows x 20 columns])])
In [130]:
Out[130]:
collections.OrderedDict
In [131]:
Out[131]:
In [132]:
Out[132]:
pandas.core.frame.DataFrame
1) Histograma del salario
In [133]:
In [134]:
Out[134]:
rownames
10 9.615385
12 48.076923
15 11.057692
18 13.942308
19 28.846154
...
32620 14.769231
32624 23.076923
32626 38.461538
32631 32.967033
32643 17.307692
Name: wage, Length: 5150, dtype: float64
In [135]:
Out[135]:
In [136]:
In [137]:
Out[137]:
rownames
10 2.263364
12 3.872802
15 2.403126
18 2.634928
19 3.361977
...
32620 2.692546
32624 3.138833
32626 3.649659
32631 3.495508
32643 2.851151
Name: lwage, Length: 5150, dtype: float64
In [138]:
Out[138]:
Cuando vemos el histograma de los valores del salario, notamos que hay una gran parte de la población estodounidense que se recibe menos de 50 dolares por hora de trabajo. La distribución de los datos, nos podría llevar a pensar que la variable está censurada o truncada, ya que está concentrada en la cola izquierda. Por otro lado, cuando vemos el logaritmo del salario,notamos una distribución más parecida a una normal, ha disminuido la variabilidad, además se ha acortado el rango del salario en una cantidad más pequeña que la original. El log salario reduce la sensibilidad de las estimaciones a las observaciones extremas o atípicas.
2) Gráfico de densidad
Salario de mujeres y hombres que terminaron la universidad
In [139]:
In [140]:
In [141]:
Out[141]:
3) Gráfico Pie
Porcentaje de personas según nivel educativo
In [142]:
Out[142]:
shs hsg scl clg ad
0.0 0.0 0.0 0.0 1.0 706
1.0 0.0 1636
1.0 0.0 0.0 1432
1.0 0.0 0.0 0.0 1256
1.0 0.0 0.0 0.0 0.0 120
dtype: int64
In [143]:
Out[143]:
4) Box - Plot
In [144]:
Out[144]:
rownames
10 0
12 0
15 0
18 1
19 0
..
32620 0
32624 0
32626 1
32631 0
32643 1
Name: dummyad, Length: 5150, dtype: int32
In [145]:
Out[145]:
[Text(0, 0, 'Hombre'), Text(1, 0, 'Mujer')]