Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
UBC-DSCI
GitHub Repository: UBC-DSCI/dsci-100-assets
Path: blob/master/2019-fall/slides/04_tutorial_class_activity.ipynb
2051 views
Kernel: R

DSCI 100: Introduction to Data Science

Tutorial 4: Effective Data Visualization Class Activity

2019-10-01

Create a new R Jupyter notebook.

Load the tidyverse and repr libraries and install and load the plotly library

install.packages("plotly") library(tidyverse) library(plotly) library(repr)
Installing package into 'C:/Users/me/Documents/R/win-library/3.5' (as 'lib' is unspecified)
package 'plotly' successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\me\AppData\Local\Temp\Rtmp001egj\downloaded_packages
Warning message: "package 'tidyverse' was built under R version 3.5.3"-- Attaching packages --------------------------------------- tidyverse 1.2.1 -- v ggplot2 3.2.1 v purrr 0.3.2 v tibble 2.1.1 v dplyr 0.8.0.1 v tidyr 0.8.3 v stringr 1.4.0 v readr 1.3.1 v forcats 0.4.0 Warning message: "package 'ggplot2' was built under R version 3.5.3"Warning message: "package 'tibble' was built under R version 3.5.3"Warning message: "package 'tidyr' was built under R version 3.5.3"Warning message: "package 'purrr' was built under R version 3.5.3"Warning message: "package 'dplyr' was built under R version 3.5.3"Warning message: "package 'stringr' was built under R version 3.5.3"Warning message: "package 'forcats' was built under R version 3.5.3"-- Conflicts ------------------------------------------ tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() Warning message: "package 'plotly' was built under R version 3.5.3" Attaching package: 'plotly' The following object is masked from 'package:ggplot2': last_plot The following object is masked from 'package:stats': filter The following object is masked from 'package:graphics': layout

Activity 1

We are interested in the distribution of the prices of diamonds which are equal to or greater than 1 carat and want to see it through a histogram in $500 intervals.

First, take a look at the diamonds dataset and edit the dataset to fit our parameter of equal to or greater than 1 carat.

## Solution

Now, make a histogram where we count the distribution of prices of diamonds which are greater than or equal to 1 carat in intervals of $500. Also, don't forget to size the plot appropriately and name the axes and legend.

Here is a hint taken from https://rstudio.com/resources/cheatsheets/

I recommend everyone take a look at this as they have cheat sheets for other libraries we use.

image.png

## Solution

Now, what if we are interested in seeing the number of diamonds for each type of cut in each interval?

image.png

Hint: Look back at Worksheet_04 Questions 2.9.3 and 2.9.4 and see how they managed to add colour to distinguish the restaurant names

## Solution

Now, we have separated the counts by colour, however, we still don't have a clear idea of how many diamonds per cut lie in our $500 intervals. There are many ways to tackle this problem. We will now use the plotly library we have just installed as one way to solve this.

## Solution