Path: blob/master/2019-spring/slides/05_GitHub_intro.ipynb
2051 views
DSCI 100 - Introduction to Data Science
Lecture 5 - Introduction to version control using GitHub
2019-01-31
What is version control?
A tool/method to keep track of document versions and work collaboratively with others!
You've probably already used version control
You've probably already used version control
Reasons to learn another tool for version control (GitHub)
designed for versioning and sharing code
can be used to host/build websites/blogs
1. Sign-up for a GitHub account and share your Github username with us:
Please visit https://github.com/ and sign-up for a free account (if you don't already have one). And then share your GitHub username with us so that we can get you set up for the group project for this course: https://canvas.ubc.ca/courses/19078/modules/items/1030189
2. Create a GitHub repository
Let's work through a demo together where we create and edit a repository for a fictionary Data Science project we are going to create.
Work in groups of 2 (to help each other out and for the collaboration exercise coming later in the lecture...).
Steps to follow:
One one person's laptop:
Go to https://github.com and make sure you are logged in.
Click green “New repository” button. Or, if you are on your own profile page, click on “Repositories”, then click the green “New” button.
Choose/set:
Repository name: exampleDataProject (or whatever you wish)
Public
YES Initialize this repository with a README
Click big green button “Create repository.”
On GitHub, click the settings button on the right and select Collaborators (top left). Enter your partner's GitHub username (they will get an email invitation to access and edit the repository).
That's it! You now have a new repository on GitHub that you two can work together on!
3. Editing files directly on GitHub
There are two ways to make changes to your files:
Edit files directly on Github (good for text files)
Make changes on files that live on a computer (e.g., the server you are working on) and then "push" the changes back to Github (good for code files)
We will try out method 1 today, Tuesday you will learn method 2.
Let's edit a file called README.md
that contains some information about a fictionary Data Science project we are going to create.
Steps to follow:
in your groups, do this one person at a time...
Click on the
README.md
file linkClick on the pen tool (right-hand side of document)
Add your name as the author to the document (e.g., "author: Tiffany Timbers")
Click on the big green button "Commit changes" to save your work
4. Getting a GitHub repository onto your computer
You need to do two things:
Introduce yourself to Git on the computer (already installed on the server)
Clone (think download) the repository onto the computer (here server)
What is Git? Git is the software on a computer that talks to GitHub.
4.1 Introduce yourself to Git
only need to do this once on your computer/server
Open a terminal from the JupyterHub Home/Control Panel (New > Terminal)
Type the following to tell Git about yourself (change your name and email to your own):
Check that you didn't make a typo (if you did then just repeat the commands above):
4.2 Clone (think download) the repository onto the computer
Visit your repository on GitHub.com
Click on the green "Clone or download" button (make sure the pop-up says "Clone with HTTPS")
copy the URL to the clipboard
Go back to the terminal and type
git clone
and paste the URL and press enter:
5. Add a Jupyter notebook to GitHub
Two big steps:
Create (or copy) a Jupyter notebook into the GitHub repository you cloned to your computer (using JupyterHub's Home/Control Panel)
Use the terminal to tell Git to send the changes to GitHub
5.1 Add a Jupyter notebook to the GitHub repository on the computer
Go to JupyterHub's Home/Control Panel
Navigate inside the GitHub repository you cloned (downloaded) by click on it
Create a new Notebook (New > R) there. Give it a name and add a code or Markdown cell to it
5. 2 Use the terminal to tell Git to send the changes to GitHub
But first, some useful unix shell commands for navigating and manipulating the filesystem:
Command | Purpose | Example use |
---|---|---|
pwd | Prints current working directory | pwd |
ls | List contents | ls Documents |
cd | Change directory | cd Desktop |
In class activities
Discuss/brainstorm what are the 3 things you do/think about (as a human) when you navigate your computer's filesystem using Finder, Explorer, Nautilus, etc)
Instructor - Demo using unix to navigate filesystem
Students - use command line to: a. figure out where you are when you open your command line b. navigate to your Documents folder c. navigate from your Documents folder to your Desktop
5. 2 Use the terminal to tell Git to send the changes to GitHub
Go back to the JupyterHub Terminal
Type
ls
to see all the files and directories thereType
cd REPOSITORY_NAME
to navigate into that repository (typepwd
to see you are where you expect to be)Type
git add NOTEBOOK.ipynb
(changeNOTEBOOK.ipynb
to the name of the file you createdType `git commit -m "added a notebook"
Type
git push
to send the notebook to GitHubVisit your repository on GitHub.com to see that the notebook made it there!
(pause)
5.3 Make changes on your computer to send to GitHub
We previously edited our ReadMe file directly on Github, lets try to add more changes but on our local machine (the server)
Navigate to the cloned repository and open the Readme text file
Make changes to the readme file
Go to terminal through the Jupyter Hub and navigate to the correct folder
Type
git status
to view the changesType
git add Readme.md
Type
git status
to see our changes move to the staging area (not required)Type
git commit -m "updated readme file"
Type
git status
to see how the changes have moved into the directory (not required)Type
git push
to send the readme file to GitHubGo to Github to see the changes
5.4 View your Git history
We have made two commit's now to our ReadMe's. Lets take a look back in time! There are two ways you can view the Git history of a project:
On GitHub through the repo's code commit view
On your local machine using git log
Arguably, the best and easiest place to view the Git history of a project is on GitHub. So let's start there. But we'll explore both as sometimes the history on your local machine might differ from that on GitHub and that is when you might need to look at both.
Let's get your partner up to speed!
open the terminal on Jupyter Hub
navigate to your cloned repository
type
git pull
to pull from the remote location (GitHub) which should have all the chanegs your partner has had
6. Deal with merge conflicts at the command line
When working with version control, usually changes are happening in more than one place (e.g., your laptop and on GitHub). So changes of the same document in different places will have to happen. There are two types of changes you need to know about (and how Git deals with them):
Changes to a document where different lines are modified (Git can automatically merge these).
Changes to a document where the same line(s) are modified (Git CANNOT automatically merge these).
I case #2 you (or some other human) has to deal with the conflict. Git kindly points you to where the problem is, and then will do no further work for you until you deal with the conflict.
6.1 How do you know you have a merge conflict?
If you do git push
and you see something like:
and then you do a git pull
and see something like this:
You have a merge conflict.
6.2 What do you do to fix a merge conflict?
Pull the changes from Github
Open the file that has a conflict (the output of
git pull
will tell you which files) in a plain text editor (e.g., Atom)Look for the conflict (hint - search for
<<<<<<< HEAD
)Fix the conflict and save the file
git add
andgit commit
your changes, and thengit push
them up to GitHub
6.3 How do you find the conflicts in a file
Here's an example of a text file with a conflict:
<<<<<<< HEAD
precedes the change you made (that you couldn't push)=======
is a separator between the conflicting changes>>>>>>> dabb4c8c450e8475aee9b14b4383acc99f42af1d
flags the end of the conflicting change you pulled from GitHub
6.4 How do you fix the conflicts in a file?
edit this file to remove these markers and reconcile the changes
We can do anything we want:
keep the change made in the local repository,
keep the change made in the remote repository,
write something new to replace both,
or get rid of the change entirely.
If we chose to write something new to replace both, it would look like this:
You then need to save, git add
, git commit
and git push
the file to have these changes reflected on GitHub.
(pause)
6.5 What about merge conflicts Jupyter notebooks???
6.5 First - a bit about what a Jupyter notebook is made up of
.ipynb
files are "plain" text files, and we can view them in a plain text editor and make some sense of themThe contents of the notebook are encoded in JSON
When we run the notebook via
Jupyter notebook
the kernel is the part the can interpret and run the code
For example, this notebook of 2 cells:
is encoded by the following JSON:
6.5 Back to version control and Jupyter notebook
Because the notebooks are stored as plain text, we can use them for version control, but this is not without issues, which include: - git diff
looks horrendous because of the JSON - manually fixing conflicts is arduous because of the JSON
Strategies to help you not end up in conflict hell with Jupyter:
Always
git pull
before you start doing ANY work!Clear output of your notebooks before you
git push
(although we need your output for MDS homework... so it depends here).
But there is hope things are better (or will get better)! nbdime
is a project that helps solve these problems... You can try to test drive it, I have yet to have success with it yet however.
7. Practicing conflict solution in pairs:
Designate one partner as the Data Science repository "Owner" and one partner as the repository "Collaborator". The repository "Owner" needs to grant the Collaborator access.
Owner:
On GitHub, click the settings button on the right.
Select Collaborators (top left), and enter your Collaborator's username.
Collaborator:
Go to your email to retrieve the
URL
to connect to the Owner's repository.Clone your partners repo:
git clone
creates a fresh local copy of a remote repository.
Stage a conflict:
Both partners modify the same line of the same file. And try to send the changes to GitHub. One of you should get a conflict. Work together to follow what was learned in lecture to resolve it.
8. How to add a gitignore file
create a git ignore file in your text editor called .gitignore
type the files you want git to ignore in the text file so you do not push them to GitHub
Attribution
Happy Git and GitHub for the useR by Jenny Bryan and the STAT 545 TAs
Software Carpentry, specifically the Unix Shell and Git lessons