Path: blob/master/2019-fall/slides/04_tutorial_class_activity.ipynb
2051 views
DSCI 100: Introduction to Data Science
Tutorial 4: Effective Data Visualization Class Activity
2019-10-01
Create a new R Jupyter notebook.
Load the tidyverse
and repr
libraries and install and load the plotly
library
Warning message:
"package 'tidyverse' was built under R version 3.5.3"-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.2.1 v purrr 0.3.2
v tibble 2.1.1 v dplyr 0.8.0.1
v tidyr 0.8.3 v stringr 1.4.0
v readr 1.3.1 v forcats 0.4.0
Warning message:
"package 'ggplot2' was built under R version 3.5.3"Warning message:
"package 'tibble' was built under R version 3.5.3"Warning message:
"package 'tidyr' was built under R version 3.5.3"Warning message:
"package 'purrr' was built under R version 3.5.3"Warning message:
"package 'dplyr' was built under R version 3.5.3"Warning message:
"package 'stringr' was built under R version 3.5.3"Warning message:
"package 'forcats' was built under R version 3.5.3"-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Warning message:
"package 'plotly' was built under R version 3.5.3"
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Activity 1
We are interested in the distribution of the prices of diamonds which are equal to or greater than 1 carat and want to see it through a histogram in $500 intervals.
First, take a look at the diamonds
dataset and edit the dataset to fit our parameter of equal to or greater than 1 carat.
Now, make a histogram where we count the distribution of prices of diamonds which are greater than or equal to 1 carat in intervals of $500. Also, don't forget to size the plot appropriately and name the axes and legend.
Here is a hint taken from https://rstudio.com/resources/cheatsheets/
I recommend everyone take a look at this as they have cheat sheets for other libraries we use.
Now, what if we are interested in seeing the number of diamonds for each type of cut in each interval?
Hint: Look back at Worksheet_04 Questions 2.9.3 and 2.9.4 and see how they managed to add colour to distinguish the restaurant names
Now, we have separated the counts by colour, however, we still don't have a clear idea of how many diamonds per cut lie in our $500 intervals. There are many ways to tackle this problem. We will now use the plotly
library we have just installed as one way to solve this.