Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
UBC-DSCI
GitHub Repository: UBC-DSCI/dsci-100-assets
Path: blob/master/2019-spring/slides/04_viz.ipynb
2051 views
Kernel: R

DSCI 100 - Introduction to Data Science

Lecture 4 - Data visualization in R

2019-01-23

Housekeeping

  • Grades are coming! Thanks for your patience!!!

  • Quiz next week!

    • 45 min

    • open book (but not collaborative!)

    • in class, but on Canvas

    • you will get some practice quiz questions by the end of the week

Reminder

Where are we? Where are we going?

image source: R for Data Science by Grolemund & Wickham

The basic ggplot call:

plot_object <- ggplot(dataframe, aes(x = a_column, y = another_column)) + geom_something() plot_object ... plot_object <- plot_object + ...

Where to get help (and ideas) for creating ggplot2 visualizations?

Only make the plot area as big as needed!

  • the default size is ridiculous!

library(tidyverse) too_big <- ggplot(mtcars, aes(x = hp, y = mpg)) + geom_point() + xlab("Horsepower") + ylab("Miles per gallon")
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ── ✔ ggplot2 3.1.0 ✔ purrr 0.2.5 ✔ tibble 1.4.2 ✔ dplyr 0.7.7 ✔ tidyr 0.8.0 ✔ stringr 1.3.1 ✔ readr 1.1.1 ✔ forcats 0.3.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag()
too_big
MIME type unknown not supported
Image in a Jupyter notebook
  • use the repr package to set your plot size with R in Jupyter

library(repr) options(repr.plot.width = 2.5, repr.plot.height = 2.5)
too_big
MIME type unknown not supported
Image in a Jupyter notebook

Don’t adjust the axes to zoom in on small differences (if the difference is small, show that its small!)

not_a_big_deal <- ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot() + ylab("Sepal length")
not_a_big_deal
MIME type unknown not supported
Image in a Jupyter notebook
not_a_big_deal <- not_a_big_deal + ylim(c(0, 7))
not_a_big_deal
Warning message: “Removed 12 rows containing non-finite values (stat_boxplot).”
MIME type unknown not supported
Image in a Jupyter notebook

Show the data (don’t hide the shape/distribution of the data behind a bar)

next two slides borrowed from Jeff Leek

Be wary of overplotting...

too_much <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + xlab("Price (US dollars)") + ylab("Size (carat)")
too_much
MIME type unknown not supported
Image in a Jupyter notebook
an_improvement <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point(alpha = 0.01) + ylab("Price (US dollars)") + xlab("Size (carat)")
an_improvement
MIME type unknown not supported
Image in a Jupyter notebook

Use colors sparingly

Use legends and labels so that your visualization is understandable without reading the surrounding text

Ensure the text on your visualization is big enough to be easily read

Do not use pie charts!

Do not use 3D!

Attribution

Go and create!

Make an effective plot!

Can petal length and petal width be used to separate the Iris flower species? Create a plot to answer this question!

head(iris)
# solution library(tidyverse) options(repr.plot.width = 5, repr.plot.height = 3) plot <- ggplot(iris,aes(x = Petal.Length, y = Petal.Width)) + geom_point(aes(color = Species, shape = Species)) + xlab("Petal Length") + ylab("Petal Width") plot
MIME type unknown not supported
Image in a Jupyter notebook

What did we learn today

  • Combine color and shape for points to separate groups in scatter plots

  • Rarely use 3D or pie charts, there are other, more effective ways of communicating the information

  • Some rules of thumb/guidelines for making visualizations

  • How to negatively filter using !

  • How to use options from the repr packages to set plot size (and that we should set plot size)!