Path: blob/master/2020-spring/materials/worksheet_01/notebook.tex
2051 views
1% Default to the notebook output style23456% Inherit from the specified cell style.789101112\documentclass[11pt]{article}13141516\usepackage[T1]{fontenc}17% Nicer default font (+ math font) than Computer Modern for most use cases18\usepackage{mathpazo}1920% Basic figure setup, for now with no caption control since it's done21% automatically by Pandoc (which extracts  syntax from Markdown).22\usepackage{graphicx}23% We will generate all images so they have a width \maxwidth. This means24% that they will get their normal width if they fit onto the page, but25% are scaled down if they would overflow the margins.26\makeatletter27\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth28\else\Gin@nat@width\fi}29\makeatother30\let\Oldincludegraphics\includegraphics31% Set max figure width to be 80% of text width, for now hardcoded.32\renewcommand{\includegraphics}[1]{\Oldincludegraphics[width=.8\maxwidth]{#1}}33% Ensure that by default, figures have no caption (until we provide a34% proper Figure object with a Caption API and a way to capture that35% in the conversion process - todo).36\usepackage{caption}37\DeclareCaptionLabelFormat{nolabel}{}38\captionsetup{labelformat=nolabel}3940\usepackage{adjustbox} % Used to constrain images to a maximum size41\usepackage{xcolor} % Allow colors to be defined42\usepackage{enumerate} % Needed for markdown enumerations to work43\usepackage{geometry} % Used to adjust the document margins44\usepackage{amsmath} % Equations45\usepackage{amssymb} % Equations46\usepackage{textcomp} % defines textquotesingle47% Hack from http://tex.stackexchange.com/a/47451/13684:48\AtBeginDocument{%49\def\PYZsq{\textquotesingle}% Upright quotes in Pygmentized code50}51\usepackage{upquote} % Upright quotes for verbatim code52\usepackage{eurosym} % defines \euro53\usepackage[mathletters]{ucs} % Extended unicode (utf-8) support54\usepackage[utf8x]{inputenc} % Allow utf-8 characters in the tex document55\usepackage{fancyvrb} % verbatim replacement that allows latex56\usepackage{grffile} % extends the file name processing of package graphics57% to support a larger range58% The hyperref package gives us a pdf with properly built59% internal navigation ('pdf bookmarks' for the table of contents,60% internal cross-reference links, web links for URLs, etc.)61\usepackage{hyperref}62\usepackage{longtable} % longtable support required by pandoc >1.1063\usepackage{booktabs} % table support for pandoc > 1.12.264\usepackage[inline]{enumitem} % IRkernel/repr support (it uses the enumerate* environment)65\usepackage[normalem]{ulem} % ulem is needed to support strikethroughs (\sout)66% normalem makes italics be italics, not underlines6768697071% Colors for the hyperref package72\definecolor{urlcolor}{rgb}{0,.145,.698}73\definecolor{linkcolor}{rgb}{.71,0.21,0.01}74\definecolor{citecolor}{rgb}{.12,.54,.11}7576% ANSI colors77\definecolor{ansi-black}{HTML}{3E424D}78\definecolor{ansi-black-intense}{HTML}{282C36}79\definecolor{ansi-red}{HTML}{E75C58}80\definecolor{ansi-red-intense}{HTML}{B22B31}81\definecolor{ansi-green}{HTML}{00A250}82\definecolor{ansi-green-intense}{HTML}{007427}83\definecolor{ansi-yellow}{HTML}{DDB62B}84\definecolor{ansi-yellow-intense}{HTML}{B27D12}85\definecolor{ansi-blue}{HTML}{208FFB}86\definecolor{ansi-blue-intense}{HTML}{0065CA}87\definecolor{ansi-magenta}{HTML}{D160C4}88\definecolor{ansi-magenta-intense}{HTML}{A03196}89\definecolor{ansi-cyan}{HTML}{60C6C8}90\definecolor{ansi-cyan-intense}{HTML}{258F8F}91\definecolor{ansi-white}{HTML}{C5C1B4}92\definecolor{ansi-white-intense}{HTML}{A1A6B2}9394% commands and environments needed by pandoc snippets95% extracted from the output of `pandoc -s`96\providecommand{\tightlist}{%97\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}98\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}99% Add ',fontsize=\small' for more characters per line100\newenvironment{Shaded}{}{}101\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}102\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}103\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}104\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}105\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}106\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}107\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}108\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}109\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}110\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}111\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}112\newcommand{\RegionMarkerTok}[1]{{#1}}113\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}114\newcommand{\NormalTok}[1]{{#1}}115116% Additional commands for more recent versions of Pandoc117\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.53,0.00,0.00}{{#1}}}118\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}119\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}120\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.73,0.40,0.53}{{#1}}}121\newcommand{\ImportTok}[1]{{#1}}122\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.73,0.13,0.13}{\textit{{#1}}}}123\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}124\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}125\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.10,0.09,0.49}{{#1}}}126\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}127\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.40,0.40,0.40}{{#1}}}128\newcommand{\BuiltInTok}[1]{{#1}}129\newcommand{\ExtensionTok}[1]{{#1}}130\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.74,0.48,0.00}{{#1}}}131\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.49,0.56,0.16}{{#1}}}132\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}133\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textbf{\textit{{#1}}}}}134135136% Define a nice break command that doesn't care if a line doesn't already137% exist.138\def\br{\hspace*{\fill} \\* }139% Math Jax compatability definitions140\def\gt{>}141\def\lt{<}142% Document parameters143\title{worksheet\_01}144145146147148% Pygments definitions149150\makeatletter151\def\PY@reset{\let\PY@it=\relax \let\PY@bf=\relax%152\let\PY@ul=\relax \let\PY@tc=\relax%153\let\PY@bc=\relax \let\PY@ff=\relax}154\def\PY@tok#1{\csname PY@tok@#1\endcsname}155\def\PY@toks#1+{\ifx\relax#1\empty\else%156\PY@tok{#1}\expandafter\PY@toks\fi}157\def\PY@do#1{\PY@bc{\PY@tc{\PY@ul{%158\PY@it{\PY@bf{\PY@ff{#1}}}}}}}159\def\PY#1#2{\PY@reset\PY@toks#1+\relax+\PY@do{#2}}160161\expandafter\def\csname PY@tok@w\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}162\expandafter\def\csname PY@tok@c\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}163\expandafter\def\csname PY@tok@cp\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.74,0.48,0.00}{##1}}}164\expandafter\def\csname PY@tok@k\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}165\expandafter\def\csname PY@tok@kp\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}166\expandafter\def\csname PY@tok@kt\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}167\expandafter\def\csname PY@tok@o\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}168\expandafter\def\csname PY@tok@ow\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}169\expandafter\def\csname PY@tok@nb\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}170\expandafter\def\csname PY@tok@nf\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}171\expandafter\def\csname PY@tok@nc\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}172\expandafter\def\csname PY@tok@nn\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}173\expandafter\def\csname PY@tok@ne\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.82,0.25,0.23}{##1}}}174\expandafter\def\csname PY@tok@nv\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}175\expandafter\def\csname PY@tok@no\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}176\expandafter\def\csname PY@tok@nl\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.63,0.63,0.00}{##1}}}177\expandafter\def\csname PY@tok@ni\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.60,0.60,0.60}{##1}}}178\expandafter\def\csname PY@tok@na\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.49,0.56,0.16}{##1}}}179\expandafter\def\csname PY@tok@nt\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}180\expandafter\def\csname PY@tok@nd\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}181\expandafter\def\csname PY@tok@s\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}182\expandafter\def\csname PY@tok@sd\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}183\expandafter\def\csname PY@tok@si\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}184\expandafter\def\csname PY@tok@se\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.73,0.40,0.13}{##1}}}185\expandafter\def\csname PY@tok@sr\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.40,0.53}{##1}}}186\expandafter\def\csname PY@tok@ss\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}187\expandafter\def\csname PY@tok@sx\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}188\expandafter\def\csname PY@tok@m\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}189\expandafter\def\csname PY@tok@gh\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}190\expandafter\def\csname PY@tok@gu\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}191\expandafter\def\csname PY@tok@gd\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}192\expandafter\def\csname PY@tok@gi\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.63,0.00}{##1}}}193\expandafter\def\csname PY@tok@gr\endcsname{\def\PY@tc##1{\textcolor[rgb]{1.00,0.00,0.00}{##1}}}194\expandafter\def\csname PY@tok@ge\endcsname{\let\PY@it=\textit}195\expandafter\def\csname PY@tok@gs\endcsname{\let\PY@bf=\textbf}196\expandafter\def\csname PY@tok@gp\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}197\expandafter\def\csname PY@tok@go\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.53,0.53,0.53}{##1}}}198\expandafter\def\csname PY@tok@gt\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}199\expandafter\def\csname PY@tok@err\endcsname{\def\PY@bc##1{\setlength{\fboxsep}{0pt}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}200\expandafter\def\csname PY@tok@kc\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}201\expandafter\def\csname PY@tok@kd\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}202\expandafter\def\csname PY@tok@kn\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}203\expandafter\def\csname PY@tok@kr\endcsname{\let\PY@bf=\textbf\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}204\expandafter\def\csname PY@tok@bp\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}205\expandafter\def\csname PY@tok@fm\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}206\expandafter\def\csname PY@tok@vc\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}207\expandafter\def\csname PY@tok@vg\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}208\expandafter\def\csname PY@tok@vi\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}209\expandafter\def\csname PY@tok@vm\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}210\expandafter\def\csname PY@tok@sa\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}211\expandafter\def\csname PY@tok@sb\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}212\expandafter\def\csname PY@tok@sc\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}213\expandafter\def\csname PY@tok@dl\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}214\expandafter\def\csname PY@tok@s2\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}215\expandafter\def\csname PY@tok@sh\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}216\expandafter\def\csname PY@tok@s1\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}217\expandafter\def\csname PY@tok@mb\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}218\expandafter\def\csname PY@tok@mf\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}219\expandafter\def\csname PY@tok@mh\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}220\expandafter\def\csname PY@tok@mi\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}221\expandafter\def\csname PY@tok@il\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}222\expandafter\def\csname PY@tok@mo\endcsname{\def\PY@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}223\expandafter\def\csname PY@tok@ch\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}224\expandafter\def\csname PY@tok@cm\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}225\expandafter\def\csname PY@tok@cpf\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}226\expandafter\def\csname PY@tok@c1\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}227\expandafter\def\csname PY@tok@cs\endcsname{\let\PY@it=\textit\def\PY@tc##1{\textcolor[rgb]{0.25,0.50,0.50}{##1}}}228229\def\PYZbs{\char`\\}230\def\PYZus{\char`\_}231\def\PYZob{\char`\{}232\def\PYZcb{\char`\}}233\def\PYZca{\char`\^}234\def\PYZam{\char`\&}235\def\PYZlt{\char`\<}236\def\PYZgt{\char`\>}237\def\PYZsh{\char`\#}238\def\PYZpc{\char`\%}239\def\PYZdl{\char`\$}240\def\PYZhy{\char`\-}241\def\PYZsq{\char`\'}242\def\PYZdq{\char`\"}243\def\PYZti{\char`\~}244% for compatibility with earlier versions245\def\PYZat{@}246\def\PYZlb{[}247\def\PYZrb{]}248\makeatother249250251% Exact colors from NB252\definecolor{incolor}{rgb}{0.0, 0.0, 0.5}253\definecolor{outcolor}{rgb}{0.545, 0.0, 0.0}254255256257258% Prevent overflowing lines due to hard-to-break entities259\sloppy260% Setup hyperref package261\hypersetup{262breaklinks=true, % so long urls are correctly broken across lines263colorlinks=true,264urlcolor=urlcolor,265linkcolor=linkcolor,266citecolor=citecolor,267}268% Slightly bigger margins than the latex defaults269270\geometry{verbose,tmargin=1in,bmargin=1in,lmargin=1in,rmargin=1in}271272273274\begin{document}275276277\maketitle278279280281282\section{Worksheet 1: Introduction to Data283Science}\label{worksheet-1-introduction-to-data-science}284285Welcome to DSCI 100: Introduction to Data Science!286287Each week you will complete a lecture assignment like this one. Before288we get started, there are some administrative details.289290You can't learn technical subjects without hands-on practice. The weekly291lecture worksheets and tutorials are an important part of the course.292The lecture worksheets will automatically be collected at the start of293the weekly tutorial. Conversely, the tutorial assigments will294automatically be collected at the start of the weekly lecture. This is295set up so that you are only working on one thing at a time. Attendance296in lectures and tutorials are required. There will be participatory297activities in both the lecture and tutorial to help support your298learning.299300Collaborating on lecture worksheets and tutorial assignments is more301than okay -\/- it's encouraged! You should rarely be stuck for more than302a few minutes on questions in lecture or tutorial, so ask a neighbor, TA303or an instructor for help (explaining things is beneficial, too -\/- the304best way to solidify your knowledge of a subject is to explain it).305Please don't just share answers, though. Everyone must submit a copy of306their own work.307308You can read more about309\href{https://github.com/UBC-DSCI/dsci-100/blob/master/policies.md}{course310policies} on the \href{https://github.com/UBC-DSCI/dsci-100}{course311website}.312313\subsubsection{Lecture and Tutorial Learning314Goals:}\label{lecture-and-tutorial-learning-goals}315316After completing this week's lecture and tutorial work, you will be able317to:318319\begin{itemize}320\tightlist321\item322use a Jupyter notebook to execute provided R code323\item324edit code and markdown cells in a Jupyter notebook325\item326create new code and markdown cells in a Jupyter notebook327\item328load the \texttt{tidyverse} library into R329\item330create new variables and objects in R using the assignment symbol331\item332use the help and documentation tools in R333\item334match the names of the following functions from the \texttt{tidyverse}335library to their documentation descriptions:336337\begin{itemize}338\tightlist339\item340\texttt{read\_csv}341\item342\texttt{select}343\item344\texttt{mutate}345\item346\texttt{filter}347\item348\texttt{ggplot}349\item350\texttt{aes}351\end{itemize}352\item353chain together two functions using the pipe operator,354\texttt{\%\textgreater{}\%}355\end{itemize}356357In this first worksheet you will also learn how to test the answers you358write in this worksheet to assess if you answered questions correctly359before your assignment is collected.360361This worksheet covers parts of362\href{https://ubc-dsci.github.io/introduction-to-data-science/chapter2.html}{Chapter3631} of the online textbook. You should read this chapter before364attempting this worksheet.365366\section{1. Jupyter notebooks}\label{jupyter-notebooks}367368This webpage is called a Jupyter notebook. A notebook is a place to369write programs and view their results.370371\subsection{1.1. Text cells}\label{text-cells}372373In a notebook, each rectangle containing text or code is called a374\emph{cell}.375376Text cells (like this one) can be edited by double-clicking on them.377They're written in a simple format called378\href{http://daringfireball.net/projects/markdown/syntax}{Markdown} to379add formatting and section headings. You don't need to learn Markdown,380but you might want to.381382After you edit a text cell, click the "run cell" button at the top that383looks like ▶\textbar{} to confirm any changes. (Try not to delete the384instructions of the lab.)385386\textbf{Question 1.1.1.} This paragraph is in its own text cell. Try387editing it so that this sentence is the last sentence in the paragraph,388and then click the "run cell" ▶\textbar{} button . This sentence, for389example, should be deleted. So should this one.390391\subsection{1.2. Code cells}\label{code-cells}392393Other cells contain code in the R language. Running a code cell will394execute all of the code it contains.395396To run the code in a cell, first click on that cell to activate it.397It'll be highlighted with a little green or blue rectangle. Next, either398press Run ▶\textbar{} or hold down the \texttt{shift} key and press399\texttt{return} or \texttt{enter}.400401Try running the next cell:402403\begin{Verbatim}[commandchars=\\\{\}]404{\color{incolor}In [{\color{incolor}1}]:} \PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Hello, World!\PYZdq{}}\PY{p}{)}405\end{Verbatim}406407408\begin{Verbatim}[commandchars=\\\{\}]409[1] "Hello, World!"410411\end{Verbatim}412413The above code cell contains a single line of code, but cells can also414contain multiple lines of code. When you run a cell, the lines of code415are executed in the order in which they appear. Every \texttt{print}416expression prints a line. Run the next cell and notice the order of the417output.418419\begin{Verbatim}[commandchars=\\\{\}]420{\color{incolor}In [{\color{incolor}2}]:} \PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{First this line is printed,\PYZdq{}}\PY{p}{)}421\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{and then this one.\PYZdq{}}\PY{p}{)}422\end{Verbatim}423424425\begin{Verbatim}[commandchars=\\\{\}]426[1] "First this line is printed,"427[1] "and then this one."428429\end{Verbatim}430431\textbf{Question 1.2.1.} Change the cell above so that it prints out:432433\begin{verbatim}434First this line is printed,435and then the next line,436and then this one.437\end{verbatim}438439\emph{Hint:} If you're stuck for more than a few minutes, try talking to440a neighbor or a TA. That's a good idea for any worksheet or tutorial441problem.442443\subsection{1.3. Writing Jupyter444notebooks}\label{writing-jupyter-notebooks}445446You can use Jupyter notebooks for your own projects or documents. When447you make your own notebook, you'll need to create your own cells for448text and code.449450To add a cell, click the + button in the menu bar. It'll start out as a451code cell. You can change it to a text cell by clicking inside it so452it's highlighted, clicking the drop-down box next to the restart (⟳)453button in the menu bar, and choosing "Markdown".454455\textbf{Question 1.3.1.} Add a code cell below this one. Write code in456it that prints out:457458\begin{verbatim}459A whole new code cell!460\end{verbatim}461462Run your cell to verify that it works.463464\textbf{Question 1.3.2.} Add a text/Markdown cell below this one. Write465the text "A whole new Markdown cell" in it.466467\subsection{1.4. Errors}\label{errors}468469R is a language, and like natural human languages, it has rules. It470differs from natural language in two important ways: 1. The rules are471\emph{simple}. You can learn most of them in a few weeks and gain472reasonable proficiency with the language in a semester. 2. The rules are473\emph{rigid}. If you're proficient in a natural language, you can474understand a non-proficient speaker, glossing over small mistakes. A475computer running R code is not smart enough to do that.476477Whenever you write code, you'll make mistakes (everyone who writes code478does, even your course instructor!). When you run a code cell that has479errors, R will sometimes produce error messages to tell you what you did480wrong.481482Errors are okay; even experienced programmers make many errors. When you483make an error, you just have to find the source of the problem, fix it,484and move on.485486We have made an error in the next cell. Run it and see what happens.487488\begin{Verbatim}[commandchars=\\\{\}]489{\color{incolor}In [{\color{incolor}3}]:} \PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{This line is missing something.\PYZdq{}}490\end{Verbatim}491492493\begin{Verbatim}[commandchars=\\\{\}]494495Error in parse(text = x, srcfile = src): <text>:2:0: unexpected end of input4961: print("This line is missing something."497\^{}498Traceback:499500501\end{Verbatim}502503\begin{figure}504\centering505\includegraphics{images/ws1_error_image.png}506\caption{ws1\_error\_image.png}507\end{figure}508509There's a lot of terminology in programming languages, but you don't510need to know it all in order to program effectively. If you see a511cryptic message like this, you can often get by without deciphering it.512(Of course, if you're frustrated, ask a neighbor or a TA for help.)513514Try to fix the code above so that you can run the cell and see the515intended message instead of an error.516517\subsection{1.5. The Kernel}\label{the-kernel}518519The kernel is a program that executes the code inside your notebook and520outputs the results. In the top right of your window, you can see a521circle that indicates the status of your kernel. If the circle is empty522(⚪), the kernel is idle and ready to execute code. If the circle is523filled in (⚫), the kernel is busy running some code.524525You may run into problems where your kernel is stuck for an excessive526amount of time, your notebook is very slow and unresponsive, or your527kernel loses its connection. If this happens, try the following steps:5281. At the top of your screen, click \textbf{Kernel}, then529\textbf{Interrupt}. 2. If that doesn't help, click \textbf{Kernel}, then530\textbf{Restart}. If you do this, you will have to run your code cells531from the start of your notebook up until where you paused your work. 3.532If that doesn't help, restart your server. First, save your work by533clicking \textbf{File} at the top left of your screen, then \textbf{Save534and Checkpoint}. Next, click \textbf{Control Panel} at the top right.535Choose \textbf{Stop My Server} to shut it down, then \textbf{My Server}536to start it back up. Then, navigate back to the notebook you were537working on.538539\subsection{1.6. Submitting your work}\label{submitting-your-work}540541All lecture worksheets and tutorials assignments in the course will be542distributed as notebooks like this one. You will complete your work in543this notebook and at the due date we will copy this notebook and grade544that copy. For lecture worksheets we will use a system called nbgrader545that checks your work. For tutorial assignments we will use a546combination of nbgrader and manual grading of your work.547548\section{2. Numbers}\label{numbers}549550Quantitative information arises everywhere in data science. In addition551to representing commands to print out lines, our R code can represent552numbers and methods of combining numbers. The expression \texttt{3.2500}553evaluates to the number 3.25. (Run the cell and see.)554555\begin{Verbatim}[commandchars=\\\{\}]556{\color{incolor}In [{\color{incolor}4}]:} \PY{l+m}{3.2500}557\end{Verbatim}5585595603.25561562563Notice that we didn't have to print. When you run a notebook cell,564Jupyter helpfully prints out that value for you.565566\begin{Verbatim}[commandchars=\\\{\}]567{\color{incolor}In [{\color{incolor}5}]:} \PY{l+m}{2}568\PY{l+m}{3}569\PY{l+m}{4}570\end{Verbatim}571572573257457557635775785794580581582Above, you should see that the three numbers (2, 3, and 4) are printed583out. In R, simply inputting numbers and running the cell will generate584all the numbers that you listed. Even though we don't need to use print,585we will continue to do in several places in these worksheets so that we586are very clear with our intentions.587588\subsection{2.1. Arithmetic}\label{arithmetic}589590The line in the next cell subtracts. Its value is what you'd expect. Run591it.592593\begin{Verbatim}[commandchars=\\\{\}]594{\color{incolor}In [{\color{incolor}6}]:} \PY{l+m}{2.0} \PY{o}{\PYZhy{}} \PY{l+m}{1.5}595\end{Verbatim}5965975980.5599600601Same with the cell below. Run it.602603\begin{Verbatim}[commandchars=\\\{\}]604{\color{incolor}In [{\color{incolor}7}]:} \PY{l+m}{2} \PY{o}{*} \PY{l+m}{2}605\end{Verbatim}6066076084609610611Many basic arithmetic operations are built in to R.612\href{https://www.statmethods.net/management/operators.html}{This613webpage} describes all the arithmetic operators used in the course. You614can refer back to this webpage as you need throughout the term.615616\section{3. Names}\label{names}617618In natural language, we have terminology that lets us quickly reference619very complicated concepts. We don't say, "That's a large mammal with620brown fur and sharp teeth!" Instead, we just say, "Bear!"621622Similarly, an effective strategy for writing code is to define names for623data as we compute it, like a lawyer would define terms for complex624ideas at the start of a legal document to simplify the rest of the625writing.626627In R, we do this with \emph{objects}. An object has a name on the left628side of an \texttt{\textless{}-} sign and an expression to be evaluated629on the right.630631\begin{Verbatim}[commandchars=\\\{\}]632{\color{incolor}In [{\color{incolor}8}]:} answer \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{3} \PY{o}{*} \PY{l+m}{2} \PY{o}{+} \PY{l+m}{4}633\end{Verbatim}634635636When you run that cell, R first evaluates the first line. It computes637the value of the expression \texttt{3\ *\ 2\ +\ 4}, which is the number63810. Then it gives that value the name \texttt{answer}. At that point,639the code in the cell is done running.640641After you run that cell, the value 10 is bound to the name642\texttt{answer}:643644\begin{Verbatim}[commandchars=\\\{\}]645{\color{incolor}In [{\color{incolor}9}]:} answer646\end{Verbatim}64764864910650651652We can name our objects anything we'd like. Above we called it653\texttt{answer}, but we could have named it \texttt{value},654\texttt{data} or anything else we desired. A good rule of thumb is to655name it something that has meaning to a human as it relates to what we656are trying to accomplish with our R code.657658\textbf{Question 3.1.} Enter a new code cell. Try creating another659object using \texttt{\textless{}-\ 3\ *\ 2\ +\ 4} with a name different660from \texttt{answer}.661662A common pattern in Jupyter notebooks is to assign a value to a name and663then immediately evaluate the name in the last line in the cell so that664the value is displayed as output.665666\begin{Verbatim}[commandchars=\\\{\}]667{\color{incolor}In [{\color{incolor}10}]:} close\PYZus{}to\PYZus{}pi \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{355}\PY{o}{/}\PY{l+m}{113}668close\PYZus{}to\PYZus{}pi669\end{Verbatim}6706716723.14159292035398673674675Another common pattern is that a series of lines in a single cell will676build up a complex computation in stages, naming the intermediate677results.678679\begin{Verbatim}[commandchars=\\\{\}]680{\color{incolor}In [{\color{incolor}11}]:} bimonthly\PYZus{}salary \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{840}681monthly\PYZus{}salary \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{2} \PY{o}{*} bimonthly\PYZus{}salary682number\PYZus{}of\PYZus{}months\PYZus{}in\PYZus{}a\PYZus{}year \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{12}683yearly\PYZus{}salary \PY{o}{\PYZlt{}\PYZhy{}} number\PYZus{}of\PYZus{}months\PYZus{}in\PYZus{}a\PYZus{}year \PY{o}{*} monthly\PYZus{}salary684\PY{k+kp}{print}\PY{p}{(}yearly\PYZus{}salary\PY{p}{)}685\end{Verbatim}686687688\begin{Verbatim}[commandchars=\\\{\}]689[1] 20160690691\end{Verbatim}692693Names in R can have letters (upper- and lower-case letters are both okay694and count as different letters), underscores, and numbers. The first695character can't be a number (otherwise a name might look like a number).696And names can't contain spaces, since spaces are used to separate pieces697of code from each other.698699Other than those rules, what you name something doesn't matter \emph{to700R}. For example, the next cell does the same thing as the above cell,701except everything has a different name:702703\begin{Verbatim}[commandchars=\\\{\}]704{\color{incolor}In [{\color{incolor}12}]:} a \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{840}705b \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{2} \PY{o}{*} a706\PY{k+kt}{c} \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{12}707d \PY{o}{\PYZlt{}\PYZhy{}} \PY{k+kt}{c} \PY{o}{*} b708\PY{k+kp}{print}\PY{p}{(}d\PY{p}{)}709\end{Verbatim}710711712\begin{Verbatim}[commandchars=\\\{\}]713[1] 20160714715\end{Verbatim}716717\textbf{However}, names are very important for making your code718\emph{readable} to yourself and others. The cell above is shorter, but719it's totally useless without an explanation of what it does.720721There is also cultural style associated with different programming722languages. In the modern R style, object names should use only lowercase723letters, numbers, and \texttt{\_}. Underscores (\texttt{\_}) are724typically used to separate words within a name (\emph{e.g.},725\texttt{answer\_one}).726727\subsection{3.1. Comments}\label{comments}728729Below you see lines like this in code cells:730731\begin{verbatim}732# Test cell; please do not change!733\end{verbatim}734735That is called a \emph{comment}. It doesn't make anything happen in R; R736ignores anything on a line after a \#. Instead, it's there to737communicate something about the code to you, the human reader. Comments738are extremely useful and can help increase how readable our code is.739740\textbf{Question 3.2.} Assign the name \texttt{seconds\_in\_an\_hour} to741the number of seconds in an hour. You should do this in two steps. In742the first you calculate the number of seconds in a minute and assign743that number the name \texttt{seconds\_in\_a\_minute}. Next you shoud744calculate the number of seconds in an hour and assign that number the745name \texttt{seconds\_in\_an\_hour.} \emph{hint - there are 60 seconds746in a minute and 60 minutes in a hour}747748\begin{Verbatim}[commandchars=\\\{\}]749{\color{incolor}In [{\color{incolor}13}]:} \PY{c+c1}{\PYZsh{} Calculate the number of seconds in an hour.}750\PY{c+c1}{\PYZsh{} Assign your answer to seconds\PYZus{}in\PYZus{}an\PYZus{}hour}751752\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}753seconds\PYZus{}in\PYZus{}a\PYZus{}minute \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{60}754seconds\PYZus{}in\PYZus{}an\PYZus{}hour \PY{o}{\PYZlt{}\PYZhy{}} seconds\PYZus{}in\PYZus{}a\PYZus{}minute \PY{o}{*} \PY{l+m}{60}755\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}756757\PY{c+c1}{\PYZsh{} We\PYZsq{}ve put this line in this cell so that it will print}758\PY{c+c1}{\PYZsh{} the value you\PYZsq{}ve given to seconds\PYZus{}in\PYZus{}a\PYZus{}decade when you}759\PY{c+c1}{\PYZsh{} run it. You don\PYZsq{}t need to change this.}760\PY{k+kp}{print}\PY{p}{(}seconds\PYZus{}in\PYZus{}an\PYZus{}hour\PY{p}{)}761\end{Verbatim}762763764\begin{Verbatim}[commandchars=\\\{\}]765[1] 3600766767\end{Verbatim}768769\subsection{3.2. Checking your code}\label{checking-your-code}770771Now that you know how to name things, you can start using the built-in772\emph{tests} to check whether your work is correct. To do this, you will773need to run the cell below to set things up. In future worksheets and774tutorial assignments you will see this cell at the very top of the775notebook:776777\begin{Verbatim}[commandchars=\\\{\}]778{\color{incolor}In [{\color{incolor}14}]:} \PY{k+kn}{library}\PY{p}{(}testthat\PY{p}{)}779\PY{k+kn}{library}\PY{p}{(}digest\PY{p}{)}780\end{Verbatim}781782783Below is an example of a test cell for Question 3.2 above (assesses784whether you have assigned \texttt{seconds\_in\_an\_hour} correctly). If785you haven't, this test will tell you the correct answer. Try not to786change the contents of the test cells. Resist the urge to just copy it,787and instead try to adjust your expression. (Sometimes the tests will788give hints about what went wrong...)789790\begin{Verbatim}[commandchars=\\\{\}]791{\color{incolor}In [{\color{incolor}15}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}792expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}seconds\PYZus{}in\PYZus{}a\PYZus{}minute\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{4bdb128c943f718f5b8f347bb4b7641b\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}793expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}seconds\PYZus{}in\PYZus{}an\PYZus{}hour\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{a69521e1dbffd4cd8f6ed869a4eba073\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}794\PY{p}{\PYZcb{}}\PY{p}{)}795\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}796\end{Verbatim}797798799\begin{Verbatim}[commandchars=\\\{\}]800[1] "Success!"801802\end{Verbatim}803804For this first question we'll provide you the solution:805806\begin{Verbatim}[commandchars=\\\{\}]807{\color{incolor}In [{\color{incolor}16}]:} \PY{c+c1}{\PYZsh{} Calculate the number of seconds in an hour.}808809\PY{c+c1}{\PYZsh{}SOLUTION:}810seconds\PYZus{}in\PYZus{}a\PYZus{}minute \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+m}{60}811seconds\PYZus{}in\PYZus{}an\PYZus{}hour \PY{o}{\PYZlt{}\PYZhy{}} seconds\PYZus{}in\PYZus{}a\PYZus{}minute \PY{o}{*} \PY{l+m}{60}812813\PY{c+c1}{\PYZsh{} We\PYZsq{}ve put this line in this cell so that it will print}814\PY{c+c1}{\PYZsh{} the value you\PYZsq{}ve given to seconds\PYZus{}in\PYZus{}a\PYZus{}decade when you}815\PY{c+c1}{\PYZsh{} run it. You don\PYZsq{}t need to change this.}816\PY{k+kp}{print}\PY{p}{(}seconds\PYZus{}in\PYZus{}an\PYZus{}hour\PY{p}{)}817\end{Verbatim}818819820\begin{Verbatim}[commandchars=\\\{\}]821[1] 3600822823\end{Verbatim}824825\section{4. Calling functions}\label{calling-functions}826827The most common way to combine or manipulate values in R is by calling828functions. R comes with many built-in functions that perform common829operations.830831We used a function \texttt{print()} at the beginning of this notebook832when we printed text from a code cell. Here we'll demonstrate using833another function \texttt{toupper()} that converts text to uppercase:834835\begin{Verbatim}[commandchars=\\\{\}]836{\color{incolor}In [{\color{incolor}17}]:} greeting \PY{o}{\PYZlt{}\PYZhy{}} \PY{k+kp}{toupper}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Why, hello there!\PYZdq{}}\PY{p}{)}837\PY{k+kp}{print}\PY{p}{(}greeting\PY{p}{)}838\end{Verbatim}839840841\begin{Verbatim}[commandchars=\\\{\}]842[1] "WHY, HELLO THERE!"843844\end{Verbatim}845846\textbf{Question 4.1.} Use the function \texttt{tolower} to change all847the words in the following movie title to lower case text: "The House848with a Clock in Its Walls" and assign the lower case text the name849\texttt{title}.850851\begin{Verbatim}[commandchars=\\\{\}]852{\color{incolor}In [{\color{incolor}18}]:} \PY{c+c1}{\PYZsh{} Change movie title to lower case using tolower()}853\PY{c+c1}{\PYZsh{} Assign your answer to an object called: title }854855\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}856title \PY{o}{\PYZlt{}\PYZhy{}} \PY{k+kp}{tolower}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{The House with a Clock in Its Walls\PYZdq{}}\PY{p}{)}857\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}858\PY{k+kp}{print}\PY{p}{(}title\PY{p}{)}859\end{Verbatim}860861862\begin{Verbatim}[commandchars=\\\{\}]863[1] "the house with a clock in its walls"864865\end{Verbatim}866867\begin{Verbatim}[commandchars=\\\{\}]868{\color{incolor}In [{\color{incolor}19}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}869expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}title\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{c76933115bc8095b2140c11556800725\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}870\PY{p}{\PYZcb{}}\PY{p}{)}871\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}872\end{Verbatim}873874875\begin{Verbatim}[commandchars=\\\{\}]876[1] "Success!"877878\end{Verbatim}879880\subsection{4.1. Multiple arguments}\label{multiple-arguments}881882Some functions take multiple arguments, separated by commas. For883example, the built-in \texttt{max} function returns the maximum argument884passed to it.885886\begin{Verbatim}[commandchars=\\\{\}]887{\color{incolor}In [{\color{incolor}20}]:} biggest \PY{o}{\PYZlt{}\PYZhy{}} \PY{k+kp}{max}\PY{p}{(}\PY{l+m}{2}\PY{p}{,} \PY{l+m}{15}\PY{p}{,} \PY{l+m}{4}\PY{p}{,} \PY{l+m}{7}\PY{p}{)}888\PY{k+kp}{print}\PY{p}{(}biggest\PY{p}{)}889\end{Verbatim}890891892\begin{Verbatim}[commandchars=\\\{\}]893[1] 15894895\end{Verbatim}896897\textbf{Question 4.1.} Use the \texttt{min} function to find the minumum898value of the numbers in the cell above.899900Assign the value to an object called \texttt{smallest}.901902\begin{Verbatim}[commandchars=\\\{\}]903{\color{incolor}In [{\color{incolor}21}]:} \PY{c+c1}{\PYZsh{} Use min() to find the smallest value. }904\PY{c+c1}{\PYZsh{} Assign your answer to an object called: smallest}905906\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}907smallest \PY{o}{\PYZlt{}\PYZhy{}} \PY{k+kp}{min}\PY{p}{(}\PY{l+m}{2}\PY{p}{,} \PY{l+m}{15}\PY{p}{,} \PY{l+m}{4}\PY{p}{,} \PY{l+m}{7}\PY{p}{)}908\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}909\PY{k+kp}{print}\PY{p}{(}smallest\PY{p}{)}910\end{Verbatim}911912913\begin{Verbatim}[commandchars=\\\{\}]914[1] 2915916\end{Verbatim}917918\begin{Verbatim}[commandchars=\\\{\}]919{\color{incolor}In [{\color{incolor}22}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}920expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}smallest\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{db8e490a925a60e62212cefc7674ca02\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}921\PY{p}{\PYZcb{}}\PY{p}{)}922\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}923\end{Verbatim}924925926\begin{Verbatim}[commandchars=\\\{\}]927[1] "Success!"928929\end{Verbatim}930931\section{5. Packages}\label{packages}932933R has many built-in functions, but we can also use functions that are934stored within packages created by other R users. We are going to use a935package, called \texttt{tidyverse}, to load, modify and plot data. This936package has already been installed for you. Later in the course you will937learn how to install packages so you are free to bring in other tools as938you need them for your data analysis.939940To use the functions from a package you first need to load it using the941\texttt{library} function. This needs to be done once per notebook (and942a good rule of thumb is to do this at the very top of your notebook so943it is easy to see what packages your R code depends on).944945\begin{Verbatim}[commandchars=\\\{\}]946{\color{incolor}In [{\color{incolor}23}]:} \PY{k+kn}{library}\PY{p}{(}tidyverse\PY{p}{)}947\end{Verbatim}948949950\begin{Verbatim}[commandchars=\\\{\}]951── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──952✔ ggplot2 3.1.0 ✔ purrr 0.2.5953✔ tibble 1.4.2 ✔ dplyr 0.7.7954✔ tidyr 0.8.0 ✔ stringr 1.3.1955✔ readr 1.1.1 ✔ forcats 0.3.0956── Conflicts ────────────────────────────────────────── tidyverse\_conflicts() ──957✖ dplyr::filter() masks stats::filter()958✖ purrr::is\_null() masks testthat::is\_null()959✖ dplyr::lag() masks stats::lag()960✖ dplyr::matches() masks testthat::matches()961962\end{Verbatim}963964\textbf{Question 5.1.} Use the \texttt{library} function to load the965\texttt{rvest} R package966967We will use this package next week to scrape data from the web!968969\begin{Verbatim}[commandchars=\\\{\}]970{\color{incolor}In [{\color{incolor}24}]:} \PY{c+c1}{\PYZsh{} Load the rvest package using the library function.}971972\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}973\PY{k+kn}{library}\PY{p}{(}rvest\PY{p}{)}974\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}975\end{Verbatim}976977978\begin{Verbatim}[commandchars=\\\{\}]979Loading required package: xml2980981Attaching package: ‘rvest’982983The following object is masked from ‘package:purrr’:984985pluck986987The following object is masked from ‘package:readr’:988989guess\_encoding990991992\end{Verbatim}993994\begin{Verbatim}[commandchars=\\\{\}]995{\color{incolor}In [{\color{incolor}25}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect, the rvest package needs to be loaded\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}996expect\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{package:rvest\PYZdq{}} \PY{o}{\PYZpc{}in\PYZpc{}} \PY{k+kp}{search}\PY{p}{(}\PY{p}{)} \PY{p}{,} is\PYZus{}true\PY{p}{(}\PY{p}{)}\PY{p}{)}997\PY{p}{\PYZcb{}}\PY{p}{)}998\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}999\end{Verbatim}100010011002\begin{Verbatim}[commandchars=\\\{\}]1003[1] "Success!"10041005\end{Verbatim}10061007\section{6. Looking for help}\label{looking-for-help}10081009\paragraph{Help Files}\label{help-files}10101011No one, even experienced, professional programmers remember what every1012function does, nor do they remember every possible function1013argument/option. So both experienced and new programmers (like you!)1014need to look things up, A LOT! One of the most efficient places to look1015for help on how a function works is the R help files. Let's say we1016wanted to pull up the help file for the \texttt{max()} function. We can1017do this by typing a question mark in front of the function we want to1018know more about:10191020\begin{Verbatim}[commandchars=\\\{\}]1021{\color{incolor}In [{\color{incolor}26}]:} \PY{o}{?}read\PYZus{}csv1022\end{Verbatim}102310241025At the very top of the file, you will see the function itself and the1026package it is in (in this case, it is base). Next is a description of1027what the function does. You'll find that the most helpful sections on1028this page are ``Usage'', ``Arguments'' and "Examples".10291030\begin{itemize}1031\tightlist1032\item1033\textbf{Usage} gives you an idea of how you would use the function1034when coding-\/-what the syntax would be and how the function itself is1035structured.1036\item1037\textbf{Arguments} tells you the different parts that can be added to1038the function to make it more simple or more complicated. Often the1039``Usage'' and ``Arguments'' sections don't provide you with step by1040step instructions, because there are so many different ways that a1041person can incorporate a function into their code. Instead, they1042provide users with a general understanding as to what the function1043could do and parts that could be added. At the end of the day, the1044user must interpret the help file and figure out how best to use the1045functions and which parts are most important to include for their1046particular task.1047\item1048The \textbf{Examples} section is often the most useful part of the1049help file as it shows how a function could be used with real data. It1050provides a skeleton code that the users can work off of.1051\end{itemize}10521053Beyond the R help files there are many resources that you can use to1054find help. \href{https://stackoverflow.com/}{Stack overflow}, an online1055forum, is a great place to go and ask questions such as how to perform a1056complicated task in R or why a specific error message is popping up.1057Oftentimes, a previous user will have already asked your question of1058interest and received helpful advice from fellow R users.10591060\textbf{Question 6.1.} Use \texttt{?read\_csv} and read the1061\textbf{Description} section to answer the multiple choice question1062below. To answer the question assign the letter associated with the1063correct answer to a variable in the the code cell below:10641065Which statement below is accurate?10661067A. \texttt{read\_csv2()} uses \texttt{;} for separators, instead of1068\texttt{,}10691070B. \texttt{read\_delim} is a special case of the \texttt{read\_csv}1071function.10721073C. These functions are useful for reading binary files, such as excel1074spreadsheets.10751076D. European countries commonly use \texttt{:} as the decimal separator.10771078\emph{Answer in the cell below using the uppercase letter associated1079with your answer. Place your answer between "", assign the correct1080answer to an object called \texttt{answer}}10811082\begin{Verbatim}[commandchars=\\\{\}]1083{\color{incolor}In [{\color{incolor}27}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer}1084\PY{c+c1}{\PYZsh{} Make sure the correct answer is an uppercase letter. }1085\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1086\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }10871088\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1089answer \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{A\PYZdq{}}1090\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1091\PY{k+kp}{print}\PY{p}{(}answer\PY{p}{)}1092\end{Verbatim}109310941095\begin{Verbatim}[commandchars=\\\{\}]1096[1] "A"10971098\end{Verbatim}10991100\begin{Verbatim}[commandchars=\\\{\}]1101{\color{incolor}In [{\color{incolor}28}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1102expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}answer\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{75f1160e72554f4270c809f041c7a776\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}11031104\PY{p}{\PYZcb{}}\PY{p}{)}1105\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1106\end{Verbatim}110711081109\begin{Verbatim}[commandchars=\\\{\}]1110[1] "Success!"11111112\end{Verbatim}11131114\section{7. Exercise}\label{exercise}11151116Now that we have learned a little about Jupyter notebooks and R, let's1117load a real dataset into R and explore it. As we do this we will learn1118more about key data loading, wrangling and visualization functions in R.11191120\subsubsection{Data about runners!}\label{data-about-runners}11211122Researchers, Vickers and Vertosick performed1123\href{https://bmcsportsscimedrehabil.biomedcentral.com/articles/10.1186/s13102-016-0052-y}{a1124study in 2016} that aimed to identify what factors affect race1125performance of recreational runners so that they could build better1126models to predict 5 km, 10 km and marathon race times. Such models can1127help runners by suggesting changes they could make to modifiable1128factors, such as training, to help them improve race time. Unmodifiable1129factors in the model, such as age or sex, allow for fair comparisons to1130be made between different runners.11311132Vickers and Vertosick reasoned that their study is important because all1133previous research done to predict races times has focused on data from1134elite athletes. This biased data set means that the models generated1135from them do not necessarily do a good job predicting race times for1136recreational runners (whose data was not in the dataset that created the1137models). Additionally, previous research focused on reporting/measuring1138factors that require special expertise or equipment that are not freely1139available to recreational runners. This means that recreational runners1140may not be able to put their characteristics/measurements for these1141factors in the race time prediction models and so they will not be able1142to obtain an accurate prediction, or a prediction at all (in the case of1143some models).11441145To make a better model, Vickers and Vertosick performed a large survey.1146They put their survey on the news website1147\href{https://slate.com/}{Slate.com} attached to a news story about race1148time prediction. They were able to obtain 2,497 responses. The survey1149included questions that allowed them to collect a data set that1150included: - age, - sex, - body mass index (BMI), - whether they are an1151edurance runner or speed demon, - what type of shoes they wear, - what1152type of training they do, - race time for 2-3 races they completed in1153the last 6 months, - self-rated fitness for each race, - and race1154difficulty for each race.11551156Let's now use this data to explore a question we might be interested in1157- is there a relationship between 5 km race time and body mass index1158(BMI) for women runners (if there is, then it might be a useful factor1159to include in a race time prediction model for these runners). We will1160answer this question by visualizing the data as a scatter plot using R.1161To accomplish this, we will need to do the following things in R:11621163\begin{enumerate}1164\def\labelenumi{\arabic{enumi}.}1165\tightlist1166\item1167load the data set into R1168\item1169subset the data we are interested in visualizing from the loaded1170dataset1171\item1172create a new column to get the unit of time in minutes instead of1173seconds1174\item1175create a scatter plot using this modified data1176\end{enumerate}11771178\textbf{Question 7.1} Which of the following will you not find included1179in Vickers and Vertosick's data set?11801181A. age11821183B. body mass index11841185C. self-rated fitness for each race11861187D. what each runner ate before the race11881189\emph{Assign your answer to an object called \texttt{answer7.1}.}11901191\begin{Verbatim}[commandchars=\\\{\}]1192{\color{incolor}In [{\color{incolor}29}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer7.1}1193\PY{c+c1}{\PYZsh{} Make sure the correct answer is an uppercase letter. }1194\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1195\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }11961197\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1198answer7.1 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{D\PYZdq{}}1199\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1200\PY{k+kp}{print}\PY{p}{(}answer7.1\PY{p}{)}1201\end{Verbatim}120212031204\begin{Verbatim}[commandchars=\\\{\}]1205[1] "D"12061207\end{Verbatim}12081209\begin{Verbatim}[commandchars=\\\{\}]1210{\color{incolor}In [{\color{incolor}30}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1211expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}answer7.1\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{c1f86f7430df7ddb256980ea6a3b57a4\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}12121213\PY{p}{\PYZcb{}}\PY{p}{)}1214\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1215\end{Verbatim}121612171218\begin{Verbatim}[commandchars=\\\{\}]1219[1] "Success!"12201221\end{Verbatim}12221223\textbf{Question 7.2} True or False:12241225The researchers compiled this data so that they could build better1226models to predict marathon race times.12271228\emph{Assign your answer to an object called \texttt{answer7.2}.}12291230\begin{Verbatim}[commandchars=\\\{\}]1231{\color{incolor}In [{\color{incolor}31}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer7.2}1232\PY{c+c1}{\PYZsh{} Make sure the correct answer is written in lower\PYZhy{}case (true / false)}1233\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1234\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }12351236\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1237answer7.2 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{true\PYZdq{}}1238\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1239\PY{k+kp}{print}\PY{p}{(}answer7.2\PY{p}{)}1240\end{Verbatim}124112421243\begin{Verbatim}[commandchars=\\\{\}]1244[1] "true"12451246\end{Verbatim}12471248\begin{Verbatim}[commandchars=\\\{\}]1249{\color{incolor}In [{\color{incolor}32}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1250expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}answer7.2\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{05ca18b596514af73f6880309a21b5dd\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}12511252\PY{p}{\PYZcb{}}\PY{p}{)}1253\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1254\end{Verbatim}125512561257\begin{Verbatim}[commandchars=\\\{\}]1258[1] "Success!"12591260\end{Verbatim}12611262\textbf{Question 7.3} What kind of graph will we be creating? Choose the1263correct answer from the options below.12641265A. Bar Graph12661267B. Pie Chart12681269C. Scatter Plot12701271D. Box Plot12721273\emph{Assign your answer to an object called \texttt{answer7.3}.}12741275\begin{Verbatim}[commandchars=\\\{\}]1276{\color{incolor}In [{\color{incolor}33}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer7.3}1277\PY{c+c1}{\PYZsh{} Make sure the correct answer is an uppercase letter. }1278\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1279\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }12801281\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1282answer7.3 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{C\PYZdq{}}1283\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1284\PY{k+kp}{print}\PY{p}{(}answer7.3\PY{p}{)}1285\end{Verbatim}128612871288\begin{Verbatim}[commandchars=\\\{\}]1289[1] "C"12901291\end{Verbatim}12921293\begin{Verbatim}[commandchars=\\\{\}]1294{\color{incolor}In [{\color{incolor}34}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1295expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}answer7.3\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{475bf9280aab63a82af60791302736f6\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}12961297\PY{p}{\PYZcb{}}\PY{p}{)}1298\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1299\end{Verbatim}130013011302\begin{Verbatim}[commandchars=\\\{\}]1303[1] "Success!"13041305\end{Verbatim}13061307Let's get started with our first step - loading the data set. The data1308set we are loading is called \texttt{marathon\_small.csv} and it1309contains a subset of the data from the study described above. The file1310is in the same directory/folder as the file for this notebook. It is a1311comma separated file (meaning the columns are separated by the1312\texttt{,} character). We often refer to these files as \texttt{.csv}'s.13131314\begin{verbatim}1315age,bmi,km5_time_seconds,km10_time_seconds,sex131625.0,21.6221160888672,NA,2798,female131741.0,23.905969619751,1210.0,NA,male131825.0,21.6407279968262,994.0,NA,male131935.0,23.5923233032227,1075.0,2135,male132034.0,22.7064037322998,1186.0,NA,male132145.0,42.0875434875488,3240.0,NA,female132233.0,22.5182952880859,1292.0,NA,male132358.0,25.2340793609619,NA,3420,male132429.0,24.505407333374,1440.0,3240,male1325\end{verbatim}13261327We can use the \texttt{read\_csv} function to do this. Below is an1328example of reading a \texttt{.csv} file that is in the same1329directory/folder as the file for the notebook that would be reading it1330in:13311332\emph{Note - the quotes around the filename are important and you will1333get an error if you forget them.}13341335\textbf{Question 7.4} Use the \texttt{read\_csv()} function to load the1336data from the \texttt{marathon\_small.csv} file into R. Save the data to1337an object called \texttt{marathon\_small}. If you need additional help1338try \texttt{?read\_csv} and/or ask your neighbours or the Instructional1339team for help.13401341\begin{Verbatim}[commandchars=\\\{\}]1342{\color{incolor}In [{\color{incolor}35}]:} \PY{c+c1}{\PYZsh{} Load marathon\PYZus{}small.csv using read\PYZus{}csv and name it: marathon\PYZus{}small}1343\PY{k+kn}{library}\PY{p}{(}tidyverse\PY{p}{)}1344\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1345marathon\PYZus{}small \PY{o}{\PYZlt{}\PYZhy{}} read\PYZus{}csv\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{marathon\PYZus{}small.csv\PYZdq{}}\PY{p}{)}1346\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1347\PY{k+kp}{head}\PY{p}{(}marathon\PYZus{}small\PY{p}{)}1348\end{Verbatim}134913501351\begin{Verbatim}[commandchars=\\\{\}]1352Parsed with column specification:1353cols(1354age = col\_double(),1355bmi = col\_double(),1356km5\_time\_seconds = col\_double(),1357km10\_time\_seconds = col\_integer(),1358sex = col\_character()1359)13601361\end{Verbatim}13621363\begin{tabular}{r|lllll}1364age & bmi & km5\_time\_seconds & km10\_time\_seconds & sex\\1365\hline136625 & 21.62212 & NA & 2798 & female \\136741 & 23.90597 & 1210 & NA & male \\136825 & 21.64073 & 994 & NA & male \\136935 & 23.59232 & 1075 & 2135 & male \\137034 & 22.70640 & 1186 & NA & male \\137145 & 42.08754 & 3240 & NA & female \\1372\end{tabular}1373137413751376\begin{Verbatim}[commandchars=\\\{\}]1377{\color{incolor}In [{\color{incolor}36}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1378expect\PYZus{}equal\PY{p}{(}\PY{k+kp}{nrow}\PY{p}{(}marathon\PYZus{}small\PY{p}{)}\PY{p}{,} \PY{l+m}{1833}\PY{p}{)}1379expect\PYZus{}equal\PY{p}{(}\PY{k+kp}{ncol}\PY{p}{(}marathon\PYZus{}small\PY{p}{)}\PY{p}{,} \PY{l+m}{5}\PY{p}{)}1380expect\PYZus{}equal\PY{p}{(}\PY{k+kp}{sum}\PY{p}{(}marathon\PYZus{}small\PY{o}{\PYZdl{}}age\PY{p}{)}\PY{p}{,} \PY{l+m}{66455.5}\PY{p}{)}1381expect\PYZus{}equal\PY{p}{(}\PY{k+kp}{sum}\PY{p}{(}marathon\PYZus{}small\PY{o}{\PYZdl{}}km5\PYZus{}time\PYZus{}seconds\PY{p}{,} na.rm \PY{o}{=} \PY{k+kc}{TRUE}\PY{p}{)}\PY{p}{,} \PY{l+m}{1944614.5}\PY{p}{)}1382\PY{p}{\PYZcb{}}\PY{p}{)}1383\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1384\end{Verbatim}138513861387\begin{Verbatim}[commandchars=\\\{\}]1388[1] "Success!"13891390\end{Verbatim}13911392The pink output under the code cell above tells you a bit about what1393happened when \texttt{read\_csv} read the data into R. It tells you that13945 columns were created (names: age, bmi, km5\_time\_seconds,1395km10\_time\_seconds and sex) as well as the type of the data in those1396columns (\emph{e.g.}, number-type or text-type), specifically:13971398\begin{itemize}1399\tightlist1400\item1401\texttt{col\_double} means that the data in this column is a1402number-type, specifically real numbers (meaning that these values1403\emph{can contain decimals})1404\item1405\texttt{col\_integer} means that the data in this column is a1406number-type, specifically integers (whole numbers)1407\item1408\texttt{col\_character} means that the data in this column contains1409text (e.g., letter or words)1410\end{itemize}14111412\textbf{Question 7.5} From the list below, which is a valid way to store1413a data frame object read in from \texttt{read\_csv} to an object in R?14141415A. data -\textgreater{} read\_csv("example\_file.csv")14161417B. data \textless{}- read\_csv("example\_file.csv")14181419C. data \textless{}- read\_csv"example\_file.csv"14201421D. data \textless{}- read\_csv(example\_file.csv)14221423\emph{Answer in the cell below using the uppercase letter associated1424with your answer. Place your answer between "", assign the correct1425answer to an object called \texttt{answer7.5}}.14261427\begin{Verbatim}[commandchars=\\\{\}]1428{\color{incolor}In [{\color{incolor}37}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer7.5}1429\PY{c+c1}{\PYZsh{} Make sure the correct answer is an uppercase letter. }1430\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1431\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }14321433\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION }1434answer7.5 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{B\PYZdq{}}1435\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION }1436\PY{k+kp}{print}\PY{p}{(}answer7.5\PY{p}{)}1437\end{Verbatim}143814391440\begin{Verbatim}[commandchars=\\\{\}]1441[1] "B"14421443\end{Verbatim}14441445\begin{Verbatim}[commandchars=\\\{\}]1446{\color{incolor}In [{\color{incolor}38}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1447expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}answer7.5\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{3a5505c06543876fe45598b5e5e5195d\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}14481449\PY{p}{\PYZcb{}}\PY{p}{)}1450\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1451\end{Verbatim}145214531454\begin{Verbatim}[commandchars=\\\{\}]1455[1] "Success!"14561457\end{Verbatim}14581459\subsubsection{Data frames}\label{data-frames}14601461We can look at the structure of the data frame using the function1462\texttt{head()}.14631464\begin{Verbatim}[commandchars=\\\{\}]1465{\color{incolor}In [{\color{incolor}39}]:} \PY{k+kp}{head}\PY{p}{(}marathon\PYZus{}small\PY{p}{)}1466\end{Verbatim}146714681469\begin{tabular}{r|lllll}1470age & bmi & km5\_time\_seconds & km10\_time\_seconds & sex\\1471\hline147225 & 21.62212 & NA & 2798 & female \\147341 & 23.90597 & 1210 & NA & male \\147425 & 21.64073 & 994 & NA & male \\147535 & 23.59232 & 1075 & 2135 & male \\147634 & 22.70640 & 1186 & NA & male \\147745 & 42.08754 & 3240 & NA & female \\1478\end{tabular}1479148014811482\texttt{head()} returns the first 6 parts of a vector or data frame.14831484\begin{verbatim}1485age,bmi,km5_time_seconds,km10_time_seconds,sex148625.0,21.6221160888672,NA,2798,female148741.0,23.905969619751,1210.0,NA,male148825.0,21.6407279968262,994.0,NA,male148935.0,23.5923233032227,1075.0,2135,male149034.0,22.7064037322998,1186.0,NA,male149145.0,42.0875434875488,3240.0,NA,female149233.0,22.5182952880859,1292.0,NA,male149358.0,25.2340793609619,NA,3420,male149429.0,24.505407333374,1440.0,3240,male1495\end{verbatim}14961497By default, the first row of a data set is always the \textbf{header}1498that \texttt{read\_csv} uses to label the column. Therefore, the first1499row contains descriptive names while the rows below contain the actual1500data.15011502This only shows us a small portion of the data set. You can look at the1503entire data set by simply running a cell with \texttt{marathon\_small}1504(data frame name) written in it but that can be very long and1505unnecessary to look at.15061507\textbf{Question 7.6} To know how many rows there really are, use the1508function \texttt{nrow()}. Replace the \texttt{fail()} with your line of1509code. Assign the number of rows to the object \texttt{number\_rows}.15101511\begin{Verbatim}[commandchars=\\\{\}]1512{\color{incolor}In [{\color{incolor}40}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: number\PYZus{}rows}1513\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }15141515\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1516number\PYZus{}rows \PY{o}{\PYZlt{}\PYZhy{}} \PY{k+kp}{nrow}\PY{p}{(}marathon\PYZus{}small\PY{p}{)}1517\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1518\PY{k+kp}{print}\PY{p}{(}number\PYZus{}rows\PY{p}{)}1519\end{Verbatim}152015211522\begin{Verbatim}[commandchars=\\\{\}]1523[1] 183315241525\end{Verbatim}15261527\begin{Verbatim}[commandchars=\\\{\}]1528{\color{incolor}In [{\color{incolor}41}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1529expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}number\PYZus{}rows\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{58fac55045cec17cd9f4006f4b5ab349\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}15301531\PY{p}{\PYZcb{}}\PY{p}{)}1532\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1533\end{Verbatim}153415351536\begin{Verbatim}[commandchars=\\\{\}]1537[1] "Success!"15381539\end{Verbatim}15401541\subsubsection{Filter}\label{filter}15421543One of the most useful functions of \texttt{tidyverse} is1544\texttt{filter()}. With this function, it is possible to filter out1545specific observations based on their entries in one or more columns.15461547For example, if we had a data set (named \texttt{data}) that looked like1548this:15491550\begin{verbatim}1551colour size speed15521 red 15 12.315532 blue 19 34.115543 blue 20 23.215554 red 22 21.915565 blue 12 33.615576 blue 23 28.81558\end{verbatim}15591560we could use the first line of the code in the image below to filter for1561rows where the colour has the value of "blue". The seconde line of code1562below would let us filter for rows where the size has a value greater1563than 20.1564156515661567\textbf{Question 7.7} Use the function \texttt{filter()} to subset your1568data frame \texttt{marathon\_small} so it only contains survey data from1569females. Assign your new filtered data frame to an object called1570\texttt{marathon\_filtered}. Replace the \texttt{fail()} with your line1571of code.15721573\begin{Verbatim}[commandchars=\\\{\}]1574{\color{incolor}In [{\color{incolor}42}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: marathon\PYZus{}filtered}1575\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }15761577\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1578marathon\PYZus{}filtered \PY{o}{\PYZlt{}\PYZhy{}} filter\PY{p}{(}marathon\PYZus{}small\PY{p}{,} sex \PY{o}{==} \PY{l+s}{\PYZsq{}}\PY{l+s}{female\PYZsq{}}\PY{p}{)}1579\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}15801581\PY{k+kp}{head}\PY{p}{(}marathon\PYZus{}filtered\PY{p}{)}1582\end{Verbatim}158315841585\begin{tabular}{r|lllll}1586age & bmi & km5\_time\_seconds & km10\_time\_seconds & sex\\1587\hline158825 & 21.62212 & NA & 2798 & female \\158945 & 42.08754 & 3240 & NA & female \\159036 & 25.40862 & 2115 & 4210 & female \\159123 & 20.86986 & 1690 & NA & female \\159234 & 23.58257 & 1603 & NA & female \\159344 & 20.03506 & 1457 & NA & female \\1594\end{tabular}1595159615971598\begin{Verbatim}[commandchars=\\\{\}]1599{\color{incolor}In [{\color{incolor}43}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1600expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{nrow}\PY{p}{(}marathon\PYZus{}filtered\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{22c7b9e96a1f1a8c4a13dc8b6586dc80\PYZsq{}}\PY{p}{)}1601expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{ncol}\PY{p}{(}marathon\PYZus{}filtered\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{dd4ad37ee474732a009111e3456e7ed7\PYZsq{}}\PY{p}{)}1602expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{sum}\PY{p}{(}marathon\PYZus{}filtered\PY{o}{\PYZdl{}}bmi\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{7cc4baefd16add414fe6a9e051a2f5f5\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}16031604\PY{p}{\PYZcb{}}\PY{p}{)}1605\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1606\end{Verbatim}160716081609\begin{Verbatim}[commandchars=\\\{\}]1610[1] "Success!"16111612\end{Verbatim}16131614\subsubsection{Select}\label{select}16151616The \texttt{select()} function allows you to zoom in and focus on1617specific parts of the data. It is particularly helpful when working with1618extremely large datasets. More specifically, the function allows you to1619separate one or more columns from your dataset and transfer them into1620their own data frame.16211622Remembering our example \texttt{data}:16231624\begin{verbatim}1625colour size speed16261 red 15 12.316272 blue 19 34.116283 blue 20 23.216294 red 22 21.916305 blue 12 33.616316 blue 23 28.81632\end{verbatim}16331634For example, we can use the function \texttt{select()} to choose columns1635of interest (here colour and shape).16361637and we would get this smaller data set back:16381639\begin{verbatim}1640colour size16411 red 1516422 blue 1916433 blue 2016444 red 2216455 blue 1216466 blue 231647\end{verbatim}16481649\textbf{Question 7.8} Use the function \texttt{select} to choose the1650columns \texttt{bmi} and \texttt{km5\_time\_seconds} from1651\texttt{marathon\_filtered}. Assign your new filtered data frame to an1652object called \texttt{marathon\_female}.16531654Replace the \texttt{fail()} with your line of code. \emph{Make sure you1655select first \texttt{bmi} and then \texttt{km5\_time\_seconds}}!16561657\begin{Verbatim}[commandchars=\\\{\}]1658{\color{incolor}In [{\color{incolor}44}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: marathon\PYZus{}female}1659\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }16601661\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1662marathon\PYZus{}female \PY{o}{\PYZlt{}\PYZhy{}} select\PY{p}{(}marathon\PYZus{}filtered\PY{p}{,} bmi\PY{p}{,} km5\PYZus{}time\PYZus{}seconds\PY{p}{)}1663\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1664\PY{k+kp}{head}\PY{p}{(}marathon\PYZus{}female\PY{p}{)}1665\end{Verbatim}166616671668\begin{tabular}{r|ll}1669bmi & km5\_time\_seconds\\1670\hline167121.62212 & NA \\167242.08754 & 3240 \\167325.40862 & 2115 \\167420.86986 & 1690 \\167523.58257 & 1603 \\167620.03506 & 1457 \\1677\end{tabular}1678167916801681\begin{Verbatim}[commandchars=\\\{\}]1682{\color{incolor}In [{\color{incolor}45}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1683expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{nrow}\PY{p}{(}marathon\PYZus{}female\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{22c7b9e96a1f1a8c4a13dc8b6586dc80\PYZsq{}}\PY{p}{)}1684expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{ncol}\PY{p}{(}marathon\PYZus{}female\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{c01f179e4b57ab8bd9de309e6d576c48\PYZsq{}}\PY{p}{)}1685expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{sum}\PY{p}{(}marathon\PYZus{}female\PY{o}{\PYZdl{}}bmi\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{7cc4baefd16add414fe6a9e051a2f5f5\PYZsq{}}\PY{p}{)}1686expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{sum}\PY{p}{(}marathon\PYZus{}female\PY{o}{\PYZdl{}}km5\PYZus{}time\PYZus{}seconds\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{9c9393e1464352cd4fbea94dfadfa02a\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}16871688\PY{p}{\PYZcb{}}\PY{p}{)}1689\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1690\end{Verbatim}169116921693\begin{Verbatim}[commandchars=\\\{\}]1694[1] "Success!"16951696\end{Verbatim}16971698\subsubsection{\texorpdfstring{Pipe Operators:1699\texttt{\%\textgreater{}\%}}{Pipe Operators: \%\textgreater{}\%}}\label{pipe-operators}17001701Pipe operators allow you to chain together different functions - it1702takes the output of one statement and makes it the input of the next1703statement. Having a chain of processing functions is known as a1704\emph{pipeline}.17051706For example, we can combine filter and select into one command:17071708\texttt{blue\_data\ \textless{}-\ filter(data,\ colour\ ==\ "blue")\ \%\textgreater{}\%\ select(colour,\ size)}17091710Since we want to specifically plot data of female participants, we need1711to first filter the sex column using the function: \texttt{filter()}.1712Below, you can see how this function as well as pipe operators1713(\texttt{\%\textgreater{}\%}) are used!. Then we need to select the1714column variables that we wish to look at. Since we want to plot BMI1715against the time it took to run 5 Kms, we must select \texttt{bmi} and1716\texttt{km5\_time\_seconds} accordingly. For this, we need to use the1717function: \texttt{select()}.17181719The following cell shows you how we can chain together filter and select1720for the marathon dataframe.17211722\begin{Verbatim}[commandchars=\\\{\}]1723{\color{incolor}In [{\color{incolor}46}]:} \PY{c+c1}{\PYZsh{} Run this cell. }17241725marathon\PYZus{}female \PY{o}{\PYZlt{}\PYZhy{}} filter\PY{p}{(}marathon\PYZus{}small\PY{p}{,} sex \PY{o}{==} \PY{l+s}{\PYZsq{}}\PY{l+s}{female\PYZsq{}}\PY{p}{)} \PY{o}{\PYZpc{}\PYZgt{}\PYZpc{}} select\PY{p}{(}bmi\PY{p}{,} km5\PYZus{}time\PYZus{}seconds\PY{p}{)}1726\PY{k+kp}{head}\PY{p}{(}marathon\PYZus{}female\PY{p}{)}1727\end{Verbatim}172817291730\begin{tabular}{r|ll}1731bmi & km5\_time\_seconds\\1732\hline173321.62212 & NA \\173442.08754 & 3240 \\173525.40862 & 2115 \\173620.86986 & 1690 \\173723.58257 & 1603 \\173820.03506 & 1457 \\1739\end{tabular}1740174117421743\textbf{Question 7.9} Why do we \textbf{only} write marathon\_small1744(original data frame) for the function: filter()?17451746A. Because select does not require the original data frame as an1747argument.17481749B. Because the pipe operator uses the data frame in the first line as1750the data frame for all subsequent lines.17511752C. Because the pipe operator uses the output of the first function as1753the input of the second function.17541755\emph{Answer in the cell below using the uppercase letter associated1756with your answer. Place your answer between "", assign the correct1757answer to an object called \texttt{answer7.9}}.17581759\begin{Verbatim}[commandchars=\\\{\}]1760{\color{incolor}In [{\color{incolor}47}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer7.9}1761\PY{c+c1}{\PYZsh{} Make sure the correct answer is an uppercase letter. }1762\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1763\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }17641765\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1766answer7.9 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{C\PYZdq{}}1767\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1768\end{Verbatim}176917701771\begin{Verbatim}[commandchars=\\\{\}]1772{\color{incolor}In [{\color{incolor}48}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1773expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}answer7.9\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{475bf9280aab63a82af60791302736f6\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}17741775\PY{p}{\PYZcb{}}\PY{p}{)}1776\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1777\end{Verbatim}177817791780\begin{Verbatim}[commandchars=\\\{\}]1781[1] "Success!"17821783\end{Verbatim}17841785\textbf{Question 7.10} What are the units of the time taken to complete1786a run of 5 Kms?17871788\emph{Hint: scroll up and look at the introduction to this exercise.}17891790\begin{Verbatim}[commandchars=\\\{\}]1791{\color{incolor}In [{\color{incolor}49}]:} \PY{c+c1}{\PYZsh{} Write you answer in lower case. Place your answer between \PYZdq{}\PYZdq{}}1792\PY{c+c1}{\PYZsh{} Assign your answer for Question 7.10 to an object called: answer7.10}17931794\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1795answer7.10 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{seconds\PYZdq{}}1796\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1797\end{Verbatim}179817991800\begin{Verbatim}[commandchars=\\\{\}]1801{\color{incolor}In [{\color{incolor}50}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1802expect\PYZus{}match\PY{p}{(}digest\PY{p}{(}answer7.10\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZdq{}}\PY{l+s}{a9cf135185e7fe4ae642c8dcb228cd2d\PYZdq{}}\PY{p}{)}1803\PY{p}{\PYZcb{}}\PY{p}{)}1804\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1805\end{Verbatim}180618071808\begin{Verbatim}[commandchars=\\\{\}]1809[1] "Success!"18101811\end{Verbatim}18121813\textbf{Question 7.11} What are the units for time (e.g., seconds,1814minutes, hours) that we would like to use when plotting BMI against time1815taken to run 5Kms? \emph{Hint: scroll up and look at the introduction to1816this exercise.}18171818\begin{Verbatim}[commandchars=\\\{\}]1819{\color{incolor}In [{\color{incolor}51}]:} \PY{c+c1}{\PYZsh{} Write you answer in lower case. Place your answer between \PYZdq{}\PYZdq{}}1820\PY{c+c1}{\PYZsh{} Assign your answer for Question 7.11 to an object called: answer7.11}18211822\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1823answer7.11 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{minutes\PYZdq{}}1824\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1825\end{Verbatim}182618271828\begin{Verbatim}[commandchars=\\\{\}]1829{\color{incolor}In [{\color{incolor}52}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1830expect\PYZus{}match\PY{p}{(}digest\PY{p}{(}answer7.11\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZdq{}}\PY{l+s}{edf7faf67d063030eba4ec85c6f7cc55\PYZdq{}}\PY{p}{)}1831\PY{p}{\PYZcb{}}\PY{p}{)}1832\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1833\end{Verbatim}183418351836\begin{Verbatim}[commandchars=\\\{\}]1837[1] "Success!"18381839\end{Verbatim}18401841\subsubsection{Mutate}\label{mutate}18421843The function \texttt{mutate()} is used to add columns to an existing1844dataset where the new column is usually a function of one of more of the1845the existing columns.18461847\textbf{Question 7.12}18481849Add a new column to our marathon\_female dataset called1850\texttt{km5\_time\_minutes} that is equal to1851\texttt{km5\_time\_seconds/60.}18521853\begin{Verbatim}[commandchars=\\\{\}]1854{\color{incolor}In [{\color{incolor}53}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: marathon\PYZus{}minutes}1855\PY{c+c1}{\PYZsh{} Replace the fail() with your line of code.}18561857\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1858marathon\PYZus{}minutes \PY{o}{\PYZlt{}\PYZhy{}} mutate\PY{p}{(}marathon\PYZus{}female\PY{p}{,} km5\PYZus{}time\PYZus{}minutes \PY{o}{=} km5\PYZus{}time\PYZus{}seconds\PY{o}{/}\PY{l+m}{60}\PY{p}{)}1859\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1860\PY{k+kp}{head}\PY{p}{(}marathon\PYZus{}minutes\PY{p}{)}1861\end{Verbatim}186218631864\begin{tabular}{r|lll}1865bmi & km5\_time\_seconds & km5\_time\_minutes\\1866\hline186721.62212 & NA & NA\\186842.08754 & 3240 & 54.00000\\186925.40862 & 2115 & 35.25000\\187020.86986 & 1690 & 28.16667\\187123.58257 & 1603 & 26.71667\\187220.03506 & 1457 & 24.28333\\1873\end{tabular}1874187518761877\begin{Verbatim}[commandchars=\\\{\}]1878{\color{incolor}In [{\color{incolor}54}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1879expect\PYZus{}equal\PY{p}{(}digest\PY{p}{(}\PY{k+kp}{sum}\PY{p}{(}marathon\PYZus{}minutes\PY{o}{\PYZdl{}}km5\PYZus{}time\PYZus{}minutes\PY{p}{)}\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{9c9393e1464352cd4fbea94dfadfa02a\PYZsq{}}\PY{p}{)} \PY{c+c1}{\PYZsh{} we hid the answer to the test here so you can\PYZsq{}t see it, but we can still run the test}18801881\PY{p}{\PYZcb{}}\PY{p}{)}1882\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1883\end{Verbatim}188418851886\begin{Verbatim}[commandchars=\\\{\}]1887[1] "Success!"18881889\end{Verbatim}18901891\subsubsection{Graphing}\label{graphing}18921893\texttt{ggplot} is a function that works using layers of code. Every1894time you want to see something new added to your plot, you must add a1895new layer with each layer being separated by the ``+'' symbol. The first1896function we use in this line of code is the \texttt{ggplot} function.1897Here, we indicate the arguments that apply to all layers of the plot.1898The second function we use is \texttt{geom\_point()}. This function1899indicates that we wish to produce a scatterplot and the way we wish to1900display the data within this scatterplot.19011902Let's plot a scatterplot with the \texttt{bmi} on the x axis and1903\texttt{km5\_time\_minutes} on the y axis.19041905\begin{figure}1906\centering1907\includegraphics{images/ws1_ggplot_female.png}1908\caption{ws1\_ggplot\_female.png}1909\end{figure}19101911\begin{Verbatim}[commandchars=\\\{\}]1912{\color{incolor}In [{\color{incolor}55}]:} \PY{c+c1}{\PYZsh{} code to set\PYZhy{}up plot size}1913\PY{k+kn}{library}\PY{p}{(}repr\PY{p}{)}1914\PY{k+kp}{options}\PY{p}{(}repr.plot.width\PY{o}{=}\PY{l+m}{4}\PY{p}{,} repr.plot.height\PY{o}{=}\PY{l+m}{3}\PY{p}{)}1915\end{Verbatim}191619171918\begin{Verbatim}[commandchars=\\\{\}]1919{\color{incolor}In [{\color{incolor}56}]:} \PY{c+c1}{\PYZsh{} Run this cell to create a scatterplot of BMI against the time it took to run 5 kilometers. }1920ggplot\PY{p}{(}data \PY{o}{=} marathon\PYZus{}minutes\PY{p}{,} aes\PY{p}{(}x \PY{o}{=} bmi\PY{p}{,} y \PY{o}{=} km5\PYZus{}time\PYZus{}minutes\PY{p}{)}\PY{p}{)} \PY{o}{+} geom\PYZus{}point\PY{p}{(}\PY{p}{)}1921\end{Verbatim}192219231924\begin{Verbatim}[commandchars=\\\{\}]1925Warning message:1926“Removed 160 rows containing missing values (geom\_point).”1927\end{Verbatim}1928192919301931\begin{center}1932\adjustimage{max size={0.9\linewidth}{0.9\paperheight}}{output_121_2.png}1933\end{center}1934{ \hspace*{\fill} \\}19351936\textbf{Question 7.13} Looking at the graph above, choose a statement1937above that most reflects what we see?19381939A. There may be a postitive trend/relationship between 5 km run time and1940body mass index; as the value for for body mass index increases, so does1941the time it takes to run 5 km.19421943B. There may be a negative trend/relationship between 5 km run time and1944body mass index; as the value for for body mass index increases, the1945time it takes to run 5 km decreases.19461947C. There appears to be no trend/relationship between 5 km run time and1948body mass index; as the value for for body mass index increases we see1949neither an increase or decrease in the time it takes to run 5 km.19501951*Assign your answer to an object called \texttt{answer7.13}.19521953\begin{Verbatim}[commandchars=\\\{\}]1954{\color{incolor}In [{\color{incolor}57}]:} \PY{c+c1}{\PYZsh{} Assign your answer to an object called: answer7.13}1955\PY{c+c1}{\PYZsh{} Make sure the correct answer is an uppercase letter. }1956\PY{c+c1}{\PYZsh{} Surround your answer with quotation marks.}1957\PY{c+c1}{\PYZsh{} Replace the fail() with your answer. }19581959\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} BEGIN SOLUTION}1960answer7.13 \PY{o}{\PYZlt{}\PYZhy{}} \PY{l+s}{\PYZdq{}}\PY{l+s}{A\PYZdq{}}1961\PY{c+c1}{\PYZsh{}\PYZsh{}\PYZsh{} END SOLUTION}1962\end{Verbatim}196319641965\begin{Verbatim}[commandchars=\\\{\}]1966{\color{incolor}In [{\color{incolor}58}]:} test\PYZus{}that\PY{p}{(}\PY{l+s}{\PYZsq{}}\PY{l+s}{Solution is incorrect\PYZsq{}}\PY{p}{,} \PY{p}{\PYZob{}}1967expect\PYZus{}match\PY{p}{(}digest\PY{p}{(}answer7.13\PY{p}{)}\PY{p}{,} \PY{l+s}{\PYZsq{}}\PY{l+s}{75f1160e72554f4270c809f041c7a776\PYZsq{}}\PY{p}{)}1968\PY{p}{\PYZcb{}}\PY{p}{)}1969\PY{k+kp}{print}\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Success!\PYZdq{}}\PY{p}{)}1970\end{Verbatim}197119721973\begin{Verbatim}[commandchars=\\\{\}]1974[1] "Success!"19751976\end{Verbatim}19771978The code we listed above for graphics barely scratches the surface of1979what ggplot, and R as a whole, are capable of. Not only are there far1980more choices about the kinds of plots available, but there are many,1981many options for customizing the look and feel of each graph. You can1982choose the font, the font size, the colors, the style of the axes, etc.19831984Let's dig a little deeper into just a couple of options that you can add1985to any of your graphs to make them look a little better. For example,1986you can change the text of the x-axis label or the y-axis label by using1987\texttt{xlab("")} or \texttt{ylab("")}. Let's do that for the1988scatterplot to make the labels easier to read.19891990\begin{Verbatim}[commandchars=\\\{\}]1991{\color{incolor}In [{\color{incolor}59}]:} \PY{c+c1}{\PYZsh{} Run this cell. }1992\PY{c+c1}{\PYZsh{} You can replace the axes with whatever you wish to label. }1993\PY{c+c1}{\PYZsh{} After running the cell once, try changing the axes to something else. }19941995ggplot\PY{p}{(}data \PY{o}{=} marathon\PYZus{}minutes\PY{p}{,} aes\PY{p}{(}x \PY{o}{=} bmi\PY{p}{,} y \PY{o}{=} km5\PYZus{}time\PYZus{}minutes\PY{p}{)}\PY{p}{)} \PY{o}{+} geom\PYZus{}point\PY{p}{(}\PY{p}{)} \PY{o}{+}1996xlab\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{Body Mass Index\PYZdq{}}\PY{p}{)} \PY{o}{+} ylab\PY{p}{(}\PY{l+s}{\PYZdq{}}\PY{l+s}{5km run time (minutes)\PYZdq{}}\PY{p}{)}1997\end{Verbatim}199819992000\begin{Verbatim}[commandchars=\\\{\}]2001Warning message:2002“Removed 160 rows containing missing values (geom\_point).”2003\end{Verbatim}2004200520062007\begin{center}2008\adjustimage{max size={0.9\linewidth}{0.9\paperheight}}{output_126_2.png}2009\end{center}2010{ \hspace*{\fill} \\}20112012\subsection{Attributions}\label{attributions}20132014\begin{itemize}2015\tightlist2016\item2017UC Berkley \href{https://github.com/data-8/data8assets}{Data 8 Public2018Materials}2019\end{itemize}202020212022% Add a bibliography block to the postdoc2023202420252026\end{document}202720282029