In this chapter you will learn how to work collaboratively on projects using Git.
Collaborating using Git has several benefits. Changes to the documents are kept under version control giving a transparent audit trail. Merging contributions from collaborators is simple, and automatic for all non-conflicting changes. Furthermore, as Git is a distributed version control system the project becomes stored on many different computers, on yours and your collaborators’ computers as well as on remote servers, effectively backing up your code and text files.
Avertissement
Although you can store anything in Git it is bad practise to track large files such as genomes and microscopy files in it.
It is bad practise in that large files makes it difficult to move the repository between different computers. Furthermore, once a file has been committed to Git it is almost impossible to remove it to free up the space.
Also, if you are using a free service to host your code it is impolite to use it as a backup system for your raw data.
It is therefore good practise to add large files in your project
to the .gitignore
file.
The first step to collaboration is to have someone to collaborate with. If you have not already shared your experiences of this book with anyone I suggest that you engage the person sitting next to you in the office as soon as possible. Things are more fun when you work with others!
Although you can host your own Git server I would strongly recommend making use of either GitHub or BitBucket as your remote server for backing up your Git repositories.
Registration is free for both GitHub and BitBucket. GitHub gives you unlimited number of public repositories and collaborators for free. However, you have to pay for private repositories. BitBucket allows you to have private repositories for free, but limits the number of collaborators that you can have on these. Personally, I started with BitBucket because I was ashamed of my code and did not want others to see it. Then I realised that code did not have to be perfect to be useful and then I started using public GitHub repositories more and more. It is up to you. However the rest of this chapter will make use of GitHub as a remote server, as it is my current favorite.
Now that you have a GitHub/BitBucket account it is time to push the
S.coelicolor-local-GC-content
project from the
Data analysis and Data visualisation chapters to it.
Go to the web interface of your account and find the button that says something
along the lines of “New repostiory” or “Create repository”. Give the project
the name S.coelicolor-local-GC-content
and make sure that the backend uses Git
(if you are using BitBucket there is an option to use an alternative source
control management system named Mercurial) and that it is initialised as an
empty repository (i.e. don’t tick any checkboxes to add readme,
licence or .gitignore
files).
Now on your local computer (the one in front of you where you have been doing
all the exercises from this book so far) go into the directory containing the
S.coelicolor-local-GC-content
repository created in
Data analysis and built on in Data visualisation.
$ cd S.coelicolor-local-GC-content
We now need to specify the remote repository. My user name on GitHub is
tjelvar-olsson
and the repository I created on GitHub was named
S.coelicolor-local-GC-content
. To add a remote to my local repository
(the one on my computer) I use the command below.
Note
In the command below tjelvar-olsson
is my GitHub user name.
$ git remote add origin \
https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content.git
The command adds the GitHub URL as a remote named origin
to the
local repository. However, we have not yet pushed any data to this remote.
Let’s do this now.
$ git push -u origin master
The git push
command pushes the changes made to the local repository to the
remote repository (named origin
). In this case the last argument (master
)
specifies the specific branch to push to.
The -u
option is the short hand for --set-upstream
and is used to set
up the association between your local branch (master
) and the branch on the
remote. You only need to specify the -u
option the first time you push a
local repository to a remote.
If you are using GitHub or BitBucket to host your remotes you do not need to remember these commands as the web interface will display the commands you need to push an existing project to a newly created repository.
Now it is time for your collaborator to get access to the repository. Use the web interface and your friend’s GitHub/BitBucket user name to give them write access to the repository.
Now your friend should be able to clone the repository. Alternatively, if all your friends are busy or uninterested you can clone the repository on a different computer or in a different directory to simulate the collaboration process by working on the same project in two different locations.
$ git clone git@github.com:tjelvar-olsson/S.coelicolor-local-GC-content.git
Cloning into 'S.coelicolor-local-GC-content'...
remote: Counting objects: 8, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 8 (delta 0), reused 8 (delta 0), pack-reused 0
Receiving objects: 100% (8/8), done.
Checking connectivity... done.
The command above clones my S.coelicolor-local-GC-content.git
repository.
You will need to replace tjelvar-olsson
with your user name. Alternatively,
have a look in the web interface for a string that can be used to clone the
repository.
Now that your friend has cloned your repository it is time for him/her to
add something to it. Create the file README.md
and add the markdown
text below to it.
# Local GC content variation in *S. coelicolor*
Project investigating the local GC content of the
*Streptomyces coelicolor* A3(2) genome.
Now let your friend add the README.md
file to the repository and commit a
snapshot of the repository.
$ git add README.md
$ git commit -m "Added readme file"
[master a531ea4] Added readme file
1 file changed, 4 insertions(+)
create mode 100644 README.md
Finally, your friend can push the changes to the remote repository.
$ git push
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 384 bytes | 0 bytes/s, done.
Total 3 (delta 1), reused 0 (delta 0)
To https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content.git
cba7277..a531ea4 master -> master
Have a look at the repository in the using the GitHub/BitBucket web interface. You should be able to see the changes reflected there.
Now look at your local repository. You will not see your friends changes reflected
there yet, because you have not yet pulled from the remote. It is time for you
to do that now. Run the git pull
command in your local repository.
$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From github.com:tjelvar-olsson/S.coelicolor-local-GC-content
cba7277..a531ea4 master -> origin/master
Updating cba7277..a531ea4
Fast-forward
README.md | 4 ++++
1 file changed, 4 insertions(+)
create mode 100644 README.md
You have now successfully pulled in the contributions from your friend into your local repository. Very cool indeed!
Let’s go over what happened again.
The workflow described above was linear. You did some work, you friend cloned your work, your friend did some work, you pulled your friends work. What if you both worked on the project in parallel in your local repositories, how would that work?
Let’s try it out. In your local repository add some information on how
to download the genome to the README.md
file.
# Local GC content variation in *S. coelicolor*
Project investigating the local GC content of the
*Streptomyces coelicolor* A3(2) genome.
Download the genome using ``curl``.
```
$ curl --location --output Sco.dna http://bit.ly/1Q8eKWT
```
Now commit these changes.
$ git add README.md
$ git commit -m "Added info on how to download genome"
[master 442c433] Added info on how to download genome
1 file changed, 6 insertions(+)
Now your friend tries to work out what the gc_content.py
and
the local_gc_content_figure.R
scripts do. The names are not
particularly descriptive so he/she looks at the code and works out
that the gc_content.py
script produces a local GC content CSV
file from the genome and that the local_gc_content_figure.R
script produces a local GC plot from the CSV file. Your friend
therefore decides to rename these scripts to dna2csv.py
and
csv2png.R
.
$ git mv gc_content.py dna2csv.py
$ git mv local_gc_content_figure.R csv2png.R
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
renamed: local_gc_content_figure.R -> csv2png.R
renamed: gc_content.py -> dna2csv.py
$ git commit -m "Improved script file names"
[master 1f868ad] Improved script file names
2 files changed, 0 insertions(+), 0 deletions(-)
rename local_gc_content_figure.R => csv2png.R (100%)
rename gc_content.py => dna2csv.py (100%)
At this point your friend pushes their changes to the GitHub remote.
$ git push
Counting objects: 2, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 351 bytes | 0 bytes/s, done.
Total 2 (delta 0), reused 0 (delta 0)
To git@github.com:tjelvar-olsson/S.coelicolor-local-GC-content.git
a531ea4..1f868ad master -> master
Now you realise that you have not pushed the changes that you made to the
README.md
file to the GitHub remote. You therefore try to do so.
$ git push
To https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content.git
! [rejected] master -> master (fetch first)
error: failed to push some refs to 'https://github.com/tjelvar-olsson/...'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
Rejected? What is going on? Well, it turns out that the “hints” give us some useful
information. They inform us that the update was rejected because the remote
contained work that we did not have in our local repository. It also suggests
that we can overcome this by using git pull
, which would pull in your
friends updates from the GitHub remote and merge them with your local updates.
$ git pull
Now unless you have defined an alternative editor as your default (using the
$EDITOR
environment variable) you will be dumped into a vim
session
with the text below in the editor.
Merge branch 'master' of https://github.com/tjelvar-olsson/...
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
To accept these changes you need to save the file and exit the editor (in Vim
Esc
followed by :wq
). Once this is done you will be dumped back into
your shell.
$ git pull
remote: Counting objects: 2, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0
Unpacking objects: 100% (2/2), done.
From https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content
a531ea4..1f868ad master -> origin/master
Merge made by the 'recursive' strategy.
local_gc_content_figure.R => csv2png.R | 0
gc_content.py => dna2csv.py | 0
2 files changed, 0 insertions(+), 0 deletions(-)
rename local_gc_content_figure.R => csv2png.R (100%)
rename gc_content.py => dna2csv.py (100%)
Note that Git managed to work out how to merge these changes together automatically. Let’s have a look at the history.
$ git log --oneline
a5779d6 Merge branch 'master' of https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content
1f868ad Improved script file names
442c433 Added info on how to download genome
a531ea4 Added readme file
cba7277 Added R script for generating local GC plot
6d8e0cf Initial file import
Now we should now be able to push to the GitHub remote.
$ git push
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 765 bytes | 0 bytes/s, done.
Total 5 (delta 2), reused 0 (delta 0)
To https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content.git
1f868ad..a5779d6 master -> master
Collaboratively working on projects in parallel, really cool stuff!
Let’s go over what happened.
So far so good, but what happens if both you and your friend edit the same part of the same file in your local repositories? How does Git deal with this? Let’s try it out.
First of all your friend pulls in your changes from the GitHub remote. By pulling your friend ensures that he/she is working on the latest version of the code.
$ git pull
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 2), reused 5 (delta 2), pack-reused 0
Unpacking objects: 100% (5/5), done.
From github.com:tjelvar-olsson/S.coelicolor-local-GC-content
1f868ad..a5779d6 master -> origin/master
Updating 1f868ad..a5779d6
Fast-forward
README.md | 6 ++++++
1 file changed, 6 insertions(+)
Your friend then edits the README.md
file.
# Local GC content variation in *S. coelicolor*
Project investigating the local GC content of the
*Streptomyces coelicolor* A3(2) genome.
Download the genome using ``curl``.
```
$ curl --location --output Sco.dna http://bit.ly/1Q8eKWT
```
Generate local GC content csv file from the genome.
```
$ python dna2csv.py
```
Your friend then commits and pushes these changes.
$ git add README.md
$ git commit -m "Added info on how to generate local GC content CSV"
[master 9f41c21] Added info on how to generate local GC content CSV
1 file changed, 6 insertions(+)
$ git push
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 399 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:tjelvar-olsson/S.coelicolor-local-GC-content.git
a5779d6..9f41c21 master -> master
Now you work on your local repository. You too are concerned with giving
more detail about how to make use of the scripts so you edit your
local copy of the README.md
file.
# Local GC content variation in *S. coelicolor*
Project investigating the local GC content of the
*Streptomyces coelicolor* A3(2) genome.
Download the genome using ``curl``.
```
$ curl --location --output Sco.dna http://bit.ly/1Q8eKWT
```
Data processing.
```
$ python dna2csv.py
$ Rscript csv2png.R
```
And you commit the changes to your local repository.
$ git add README.md
[-- olssont@ exit=0 S.coelicolor-local-GC-content --]
$ git commit -m "Added more info on how to process data"
[master 2559b5d] Added more info on how to process data
1 file changed, 7 insertions(+)
However, when you try to push you realise that your friend has pushed changes to the remote.
$ git push
Username for 'https://github.com': tjelvar-olsson
Password for 'https://tjelvar-olsson@github.com':
To https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content.git
! [rejected] master -> master (fetch first)
error: failed to push some refs to 'https://github.com/tjelvar-olsson/...'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
So you pull.
$ git pull
remote: Counting objects: 3, done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content
a5779d6..9f41c21 master -> origin/master
Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.
The automatic merge failed! The horror!
Let’s find out what the status of the repository is.
$ git status
On branch master
Your branch and 'origin/master' have diverged,
and have 1 and 1 different commit each, respectively.
(use "git pull" to merge the remote branch into yours)
You have unmerged paths.
(fix conflicts and run "git commit")
Unmerged paths:
(use "git add <file>..." to mark resolution)
both modified: README.md
no changes added to commit (use "git add" and/or "git commit -a")
Okay, so both you and your friend have been editing the README.md
file.
Let’s have a look at it.
# Local GC content variation in *S. coelicolor*
Project investigating the local GC content of the
*Streptomyces coelicolor* A3(2) genome.
Download the genome using ``curl``.
```
$ curl --location --output Sco.dna http://bit.ly/1Q8eKWT
```
<<<<<<< HEAD
Data processing.
```
$ python dna2csv.py
$ Rscript csv2png.R
=======
Generate local GC content csv file from the genome.
```
$ python dna2csv.py
>>>>>>> 9f41c215cce3500e80c747426d1897f93389200c
```
So Git has created a “merged” file for us and highlighted the section that
is conflicting. In the above the first highlighted region contains the changes
from the HEAD
in your local repository and the second highlighted region
shows the changes from your friend’s commit 9f41c21
.
Now you need to edit the file so that you are happy with it. Your
friend’s idea of documenting what the command does is a good one so you could
edit the README.md
file to look like the below.
# Local GC content variation in *S. coelicolor*
Project investigating the local GC content of the
*Streptomyces coelicolor* A3(2) genome.
Download the genome using ``curl``.
```
$ curl --location --output Sco.dna http://bit.ly/1Q8eKWT
```
Generate ``local_gc_content.csv`` file from ``Sco.dna`` file.
```
$ python dna2csv.py
```
Generate ``local_gc_content.png`` file from ``local_gc_content.csv`` file.
```
$ Rscript csv2png.R
```
Now you need to add and commit these changes.
$ git add README.md
[-- olssont@ exit=0 S.coelicolor-local-GC-content --]
$ git commit -m "Manual merge of readme file"
[master 857b470] Manual merge of readme file
Now that you have merged the changes you can push to the remote.
$ git push
Counting objects: 6, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 708 bytes | 0 bytes/s, done.
Total 6 (delta 4), reused 0 (delta 0)
To https://github.com/tjelvar-olsson/S.coelicolor-local-GC-content.git
9f41c21..857b470 master -> master
Let’s go over what just happened.
README.md
fileREADME.md
file in your local repositoryREADME.md
fileREADME.md
file to the staging area and committed itThat’s it, you now have all the tools you need to start collaborating with your colleagues using Git! May the force be with you.