Path: blob/master/2021-summer/materials/worksheet_05/worksheet_05.ipynb
2051 views
Worksheet 5: Introduction to version control
You can read more about course policies on the course website.
Lecture and Tutorial Learning Goals:
After completing this week's lecture and tutorial work, you will be able to:
Describe what version control is and why data analysis projects can benefit from it
Create a remote version control repository on GitHub
Move changes to files from GitHub to JupyterHub, and from JupyterHub to GitHub
Give collaborators access to the repository
Resolve conflicting edits made by multiple collaborators
Communicate with collaborators using issues
Use best practices when collaborating on a project with others
This worksheet covers parts of Chapter 5 of the online textbook. You should read this chapter before attempting the worksheet.
Note the following important information from UBC about your privacy: GitHub.com is stored on servers outside Canada. When you access this site from UBC’s JupyterHub environment, you are transferred to these servers. UBC cannot guarantee security of your private information on servers outside of Canada. Please exercise caution whenever using personal information. You may wish to use a pseudonym to protect your privacy if you have concerns. Please feel free to contact us at UBC ([email protected]) if you have any questions about your privacy.
1. What is version control? Why use it?
Question 1.1 Multiple Choice:
{points: 1}
Which reason listed below is not a good reason to use version control:
A. Version control tools provide transparency on how a project evolved by tracking the history of documents, and who made what changes to those documents.
B. Version control tools usually include a remote/cloud repository hosting service that can act as a backup of your local files (i.e., the files on your computer).
C. In practice, most data science projects involve collaboration on documents that contain code (e.g., Jupyter notebooks), and version control tools facilitate collaboration on such documents.
D. Version control tools check the accuracy of your code.
Assign your answer to an object called answer1.1
. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F"
).
Question 1.2 True or false:
{points: 1}
Git is a remote/cloud repository hosting service where you can backup and share your files with collaborators.
Assign your answer to an object called answer1.2
. Make sure your answer is written in lowercase and is surrounded by quotation marks (e.g. "true"
or "false"
).
2. Creating a space for your data science project online
For the rest of this worksheet, you will create a toy data science project on GitHub to practice using Git and GitHub. We will ask you questions about what you are doing along the way to test your understanding.
Signup for a free GitHub.com account:
If you do not already have a free GitHub.com account, visit GitHub.com and signup for one. Store your username and password in a secure place (we recomend using a password manager for things like this, examples of these are LastPass, 1Password, etc).
Create a GitHub repository:
On GitHub.com create a new repository and name it toy_ds_project
. You can decide whether to make it private or public. Ensure that you select “Add a README file.” This task corresponds to this step in the textbook.
Question 2.1 Multiple Choice:
Which statement below is not true about GitHub repositories:
{points: 1}
A. Immediately after a repository is created on GitHub.com using the website, the repository exists only on GitHub.com and does not exist on your computer (i.e., you need to do something to get a copy of it on your computer).
B. Only the creator of GitHub repository, and people the creator specify, can edit the files in the repository. This is true even when the repository is public.
C. If the repository is public, anyone on the web can view it.
D. If the repository is public, anyone on the web can edit it.
E. A GitHub repository is like a folder on Dropbox or Google Drive, but it is different in that it has special properties for version control.
Assign your answer to an object called answer2.1
. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F"
).
3. Creating and editing files on GitHub
Edit the
README.md
file in yourtoy_ds_project
repository on GitHub.com using the pen tool. Write "project creation date:" and list today's date.Commit this change directly to the main branch and write the commit message "added creation date". This task corresponds to this step in the textbook.
Next, use the pen tool again to edit the
README.md
file. Write "author" and list your name as the author. Commit this change and use the commit message "added project author".Explore the commit history of your project by clicking on the link that looks like this:
Note: you can visit the version of your repository at any stage in its history by click on the
<>
buttons! Give it a try!
Question 3.1 True or false:
{points: 1}
Even though commit messages are required to edit a file using the pen tool on GitHub.com, it doesn't matter what message you write in practice.
Assign your answer to an object called answer3.1
. Make sure your answer is written in lowercase and is surrounded by quotation marks (e.g. "true"
or "false"
).
4. Cloning your repository on JupyterHub
For our data science project, we need to put a copy of our repository somewhere we can run and test the code we write (otherwise, we won't know that our code works!!!). We can use the course JupyterHub for this!
Clone GitHub repository to the course JupyterHub:
Clone a copy of this GitHub repository to the course JupyterHub using the Jupyter Git extension. This task corresponds to this step in the textbook.
Question 4.1 True or false:
{points: 1}
The definition of cloning a repository is to copy/download the entire contents (files, project history, and location of the remote repository) of a remote GitHub.com repository to a computer (e.g., your workspace on a JupyterHub, or your laptop).
Assign your answer to an object called answer4.1
. Make sure your answer is written in lowercase and is surrounded by quotation marks (e.g. "true"
or "false"
).
5. Working in a cloned repository on JupyterHub
Now that your repository exists in your workspace on the course JupyterHub, you can create a new Jupyter notebook with an R kernel and write some code! To help this project move along, we show you below how to create a new Jupyter notebook and save it and some code to put in it.
Creating a new Jupyter notebook with an R kernel
To create a new Jupyter notebook with an R kernel in your toy_ds_project
repository, use the file navigation menu of Jupyter so that you are inside the toy_ds_project
:
Once there, click on new R notebook.
Next, right-click on the filename and click on "Rename", to rename the file marg_vs_divorce_viz.ipynb
.
Add code to the notebook you created
Add the code below to the notebook and run it to display the data visualization. Feel free to add a narrative to the notebook if you like, commenting on the question being asked, the data visualization results, and whether correlation means causation. When you are done, save the notebook.
6. Specifying files to commit
Now we would like to start the process of putting marg_vs_divorce_viz.ipynb
under version control and eventually push this file to our remote repository on GitHub.com. The first step to doing this is to add the changes to this file (creating it and the code) to the Git staging area. Go ahead and use the Jupyter Git extension to do this now. This task corresponds to this step in the textbook.
Question 6.1 Multiple Choice:
{points: 1}
Git has a distinct step of adding files to the staging area because:
A. Not all changes we make (i.e., files we create or edit) are ones that we want to push to our remote GitHub repository.
B. It allows us to edit multiple files at once, but associate particular commit messages with particular files (so that the commit messages can more specifically reflect the changes that were made).
C. This is technically required of all version control software.
D. A and C.
E. A and B.
Assign your answer to an object called answer6.1
. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F"
).
7. Making the commit
The next step is to commit our changes to our local Git repository. You can use the Jupyter Git extension to do this now. This task corresponds to this step in the textbook.
Question 7.1 True or false:
{points: 1}
When we commit our changes to Git, the snapshot of changes, the commit message, the time and date stamp and the user who committed the changes are all saved to the Git history on GitHub.
Assign your answer to an object called answer7.1
. Make sure your answer is written in lowercase and is surrounded by quotation marks (e.g. "true"
or "false"
).
8. Pushing the commits to GitHub
Finally, we are ready to send our changes (creating and adding code to marg_vs_divorce_viz.ipynb
) to our remote repository through a process we call "pushing". Go ahead and do this now. This task corresponds to this step in the textbook.
After completing pushing your work to the remote repository on GitHub, visit your repository on GitHub.com and check out what your awesome toy project looks like!!!
Question 8.1 Multiple Choice:
Which statement below is not true?
{points: 1}
A. Cloning and pulling a GitHub repository are the exact same thing.
B. Pushing with Git is the act of sending changes that were committed to Git to a remote repository, for example, on GitHub.com.
C. Pulling with Git is the act of collecting changes that exists in a remote repository, for example, on GitHub.com, that do not yet exist on the local computer you are working on (i.e., your workspace on the JupyterHub or your laptop).
D. You should push your work to GitHub anytime you want to share your work with others, or when you are done a work session and want to back up your work.
Assign your answer to an object called answer8.1
. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F"
).
9. Giving collaborators access to your project
One of the advantages of using version control tools, such as Git and GitHub, is how it lets you collaborate. Let's get some practice starting down this path. Add one or more of your group members to your GitHub repository as a collaborator. This task corresponds to this step in the textbook.
Question 9.1 True or false:
{points: 1}
You can clone or pull from any public remote repository on GitHub.com, however you can only push to public remote repositories on GitHub.com that you own are a collaborator on.
Assign your answer to an object called answer9.1
. Make sure your answer is written in lowercase and is surrounded by quotation marks (e.g. "true"
or "false"
).
(Optional) more collaboration practice!
If you want to practice more Git & GitHub skills for collaboration, ask someone in your room if you can collaborate and send an edit to their project. To do this, they will need to add you as a collaborator, and then you will need to clone their repository to your JupyterHub. After that, you can edit some files (or create a whole new one), save your work, and then use the Jupyter Git extension to add, commit, and push your changes to their remote GitHub repository.
10. Communicating using GitHub issues
It's easy for project communications to get lost in email or whatever messaging platform you use to communicate with your team. GitHub issues are an excellent tool explicitly designed for project collaboration as they are "attached" to the project's remote GitHub repository. Your task here is to go to the issue tab for your project and create an issue about something you might want to improve about your project. This task corresponds to this step in the textbook.
Question 10.1 Multiple Choice:
{points: 1}
Which statement below is not a reason why GitHub issues are an ideal medium for project-specific communications?
A. Issues are part of each GitHub repository, and thus "attached" to the project.
B. Issues only persist while they are open, and immediately deleted when they are closed.
C. Issues are easily searchable using GitHub’s search tools.
D. All issues are accessible to all project collaborators, so no one is left out of the conversation.
E. Issues can be set up so that team members get email notifications when a new issue is created or a new post is made in an issue thread.
Assign your answer to an object called answer10.1
. Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F"
).
(Optional) Even more collaboration practice!
Visit a group member's GitHub repository and leave a polite but constructive message on how they could improve their project.