Sometimes you need to work on a computer different to the one that is in front of you. This may be because you are working from home and need to log into your work computer. Alternatively it may be because you need to do some work on a high-performance computing cluster.
In this chapter we will cover tools for logging into remote machines using the SSH (Secure SHell) protocol. SSH is an open standard that is available on all UNIX based systems. In fact it has gained so much traction that Microsoft have decided to add support for it in Windows.
Although, it is relatively simple to log into a remote machine using SSH explaining it requires the use of some networking terminology (as we are logging into the remote machine over a network). Let us therefore start off with some jargon busting before we get into the practical details.
A computer can be connected to a network using physical Ethernet cables or in a wireless fashion using WiFI. Computers connected to the same network can communicate with each other. This communication takes place over so called “ports”, which are identified by numbers. A famous port is port 80, which is used for HTTP (the HyperText Transfer Protocol) a.k.a. web-browsing. There are many other ports. For example port 443 is used for HTTPS and port 22 is used for SSH.
Machines on a network can be identified and accessed via their IP (Internet
Protocol) address. An IP address that you may have come across is
127.0.0.1
, it identifies the local machine that you are working on.
Although very useful, IP addresses can be difficult to remember. In order to overcome this problem, internet service providers as well as the people responsible for looking after your organisations network make use of domain name servers (DNS). The purpose of a DNS server is to translate unique resource locations (URLs) to IP addresses.
A DNS server can also be used to lookup the IP address of a machine given a
simpler more memorable name. To illustrate this we can find the IP address(es)
of Google’s web servers using the command host
.
$ host www.google.com
www.google.com has address 62.164.169.148
www.google.com has address 62.164.169.185
www.google.com has address 62.164.169.163
www.google.com has address 62.164.169.177
www.google.com has address 62.164.169.166
www.google.com has address 62.164.169.152
www.google.com has address 62.164.169.155
www.google.com has address 62.164.169.154
www.google.com has address 62.164.169.176
www.google.com has address 62.164.169.159
www.google.com has address 62.164.169.165
www.google.com has address 62.164.169.181
www.google.com has address 62.164.169.187
www.google.com has address 62.164.169.170
www.google.com has address 62.164.169.144
www.google.com has address 62.164.169.174
www.google.com has IPv6 address 2a00:1450:4009:810::2004
If you copy and paste one of the IP addresses above into a web browser you should see the Google home page.
Throughout the remainder of this chapter we will connect to remote machines using their hostname. What happens in practise is that your institutes DNS server translates this hostname to an IP address and you send your communications over the network to the machine identified by that IP address. Furthermore when you communicate with the remote machine you will be using a specific port. The default port for the SSH protocol is port 22.
In its most basic form the ssh
command takes on the form ssh hostname
.
Where hostname
is the name of the remote machine that you want to log in to.
This assumes that you want to login to the remote machine using the same user
name as that on the local machine. This is often not the case and it is
common to see the ssh
command in the form ssh user@hostname
, where
user
is the Unix user name you want to log in as on the remote machine.
Let us illustrate this with an example. Suppose that one wanted to login to a
remote computer named hpc
, this could for example be the head node on your
institutes high-performance computing cluster. Assuming that your user name on
the head node was olssont
then you could log in using the command below.
$ ssh olssont@hpc
Avertissement
All of the remote machines in this chapter are fictional. This means that if you try to run the commands verbatim you will see errors along the lines of the below.
ssh: Could not resolve hostname ...
When trying these examples make sure that you are trying to connect to a machine that exists on your network.
If the machine that you are trying to log in to has a port 22 open (the default SSH port) you will be prompted for your password.
The above assumes that your DNS server can resolve the name hpc
into an IP
address. Depending on how things are setup in your organisation you may need to
use a more explicit name, for example
hpc.awesomeuni.ac.uk
.
$ ssh olssont@hpc.awesomeuni.ac.uk
If your user name and password authenticates successfully the shell in your terminal will
now be from the remote machine. To find out the name of the machine that you
are logged into you can run the command hostname
.
$ hostname
hpc
To disconnect from the remote host you can use Ctrl-D
or the exit
command.
$ exit
$ hostname
olssont-laptop
In the above the hostname
command prints out the hostname of the local machine.
By default port 22 is used for the SSH protocol. However, sometimes a machine
may expose its SSH server on a different port. For example if
we had a machine called bishop
that exposed its SSH server on port 2222
one could login to it using the command below.
$ ssh -p 2222 olssont@bishop
In the above the -p
flag is used to specify the port to connect to.
Sometimes you want to be able to run software that makes use of windowing
systems (i.e. all software with a graphical user interface). For example the
statistical software package R
has built in functionality for displaying
plots in a graphical window, which means it requires a windowing system. Most
Unix-based systems make use of the X11 as their windowing system. We therefore
need to enable X11-forwarding in SSH to be able to run programs that require
graphics. This is achieved using the -X
flag.
$ ssh -X olssont@pawn
In the above we are connecting to a machine named pawn
with X11-forwarding
enabled.
Now that we know how to login to a remote machine we need to work out how to
copy data to and from it. This is achieved using the scp
, secure copy,
command.
Suppose that we wanted to copy the file mydata.csv
over to olssont
‘s
home directory on the hpc
head node, we could achieve this using the
command below.
$ scp mydata.csv olssont@hpc:
Note the colon (:
) after the host name. It demarcates the end of the host name
and the beginning of the location to copy the file to on the remote machine.
In this instance the latter is left empty and as such the original file name is used
and the location for the file defaults to olssont
‘s home directory on the remote
machine. The command above is equivalent to that below which specifies the home directory
using a relative path (~/
).
$ scp mydata.csv olssont@hpc:~/
It is also possible to specify the location using an absolute path. For example
if we wanted to save the file in the /tmp
directory this could be achieved
using the command below.
$ scp mydata.csv olssont@hpc:/tmp/
Just like with the cp
command it is possible to give the copied file a
different name. For example to name it data.csv
(and place it in the /tmp
directory) one could use the command below.
$ scp mydata.csv olssont@hpc:/tmp/data.csv
If the SSH server is listening on a port other than 22 one needs to specify the port
explicitly. Confusingly the argument for this is not the same as for the ssh
command.
The scp
command uses the argument -P
, i.e. it uses upper rather than lower case.
So if we wanted to copy the data to bishop
, where the SSH server is listening
on port 2222 one could use the command below.
$ scp -P 2222 mydata.csv olssont@bishop:
Sometimes one wants to copy the entire content of a directory. In this case one can use
the -r
option to recursively copy all the content of the specified directory. For
example if we had a directory named data
and we wanted to copy it to pawn
one
could use the command below.
$ scp -r data/ olssont@pawn:
All of the commands above will prompt you for your password. This can get tedious. In the next section we will look at a more secure and less annoying way of managing the authentication step when working with remote machines.
An alternative and more secure method to using password based authentication is to use public-key cryptography. Public-key cryptography, also known as asymmetric cryptography, uses a pair of so called “keys”. One of these keys is public and one is private. The public key is one that you can distribute freely, in this case to all the remote machines that you want to be able to login to. However, the private key must never be compromised as it is what allows you access to all the remote machines. One way to think about this system is to view the public key as a lock that only the private key can open. You can fit the all the machines that you want secure access to with copies of the same public key as long as you keep the private key safe.
Enough theory let’s try it out.
The first step is to generate a public/private key pair. This is achieved using
the command ssh-keygen
. This will prompt you for the file to save the key
as, the default ~/.ssh/id_rsa
file is a good option if you have not yet
setup any key pairs. You will then be prompted, to optionally, enter a
passphrase. This provides another layer of protection in case someone gets
hold of your private key. However, it does mean that you will be prompted for
the passphrase the first time you make use of the key in a newly booted system.
Personally, I am paranoid so I make use of the passphrase and I suggest that
you do too.
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/olssont/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
If you used the default naming scheme for your key pair you should now have two
files in your .ssh
directory: id_rsa
(your private key) and id_rsa.pub
(your public key).
$ ls -l .ssh/
-rw------- 1 olssont NR4\Domain Users 1679 23 Feb 2015 id_rsa
-rw-r--r-- 1 olssont NR4\Domain Users 407 23 Feb 2015 id_rsa.pub
Note that only the user has read/write permissions on the private key, whereas the pubic key is readable by everyone.
Now let us setup password-less login to the cluster head node. First of all let
us copy the public key to the remote machine using scp
.
$ scp ~/.ssh/id_rsa.pub olssont@hpc:
Now we need to login to the head node to configure it. At this point we will
still need to use our password. Once logged into the head node we need to
create a .ssh
directory in the user’s home directory (if it does not
already exist). We then need to append the public key to a file named
authorized_keys
in the .ssh
directory. Finally we logout of the head
node.
$ ssh olssont@hpc
$ hostname
hpc
$ mkdir .ssh
$ cat id_rsa.pub >> .ssh/authorized_keys
$ exit
Now we should be able to ssh
and scp
to the head node in a password-less fashion.
If you setup your key pair using a passphrase you will be prompted for it the first time
you use the key pair.
Great that’s really cool! However, it was quite a lot of work to get the public key onto
the remote machine. There is a better way to do this using the program ssh-copy-id
.
Depending on the operating system that you are using may need to install this
program, see Managing your system for details on how to install
software.
Once you have ssh-copy-id
on your system you can provision a remote machine with your
public key using a single command. Below we use it to add our pubic key to bishop
.
$ ssh-copy-id -i ~/.ssh/id_rsa.pub olssont@bishop
The optional -i
flag is used to specify which public key should be copied
to the remote machine.
Suppose that access to your institutes cluster was setup in a way that required
you to use the full hpc.awesomeuni.ac.uk
host name, but that you wanted to
be able to login using the shorter name hpc
. You can configure your machine
to setup access in this fashion by creating the file .ssh/config
file and adding
the lines below to it.
Host hpc
HostName hpc.awsomeuni.ac.uk
User olssont
The SSH configuration above also specifies the user name. This means that you can login to the head node using the command below (note the lack of a user name).
$ ssh hpc
As you start using
SSH keys to manage access to various machines you are likely to find yourself
using multiple key pairs. In this case you will want to be able to specify the name
of the private key, also known as an identity file, in the .ssh/config
file.
Host hpc
HostName hpc.awsomeuni.ac.uk
User olssont
IdentityFile ~/.ssh/id_rsa
Finally in the examples described earlier access to bishop
had been configured
to use port 2222. To configure access to this remote machine we could use the
specification below.
Host bishop
HostName bishop
User olssont
Port 2222
IdentityFile ~/.ssh/id_rsa
Again, using the .ssh/config
file in this way means that we do not need to remember
port numbers and what options to invoke the scp
and ssh
commands with.
Copying a file can then be achieved using the concise syntax below.
$ scp mydata.csv bishop:
Logging in to the machine becomes similarly trivial.
$ ssh bishop
One problem that one can encounter when working on a remote machine is that if the connection is broken whilst a program is running it may fail.
Luckily, it is quite easy to work around this. All one needs to do is to prefix
the command to run the program of interest with nohup
. The nohup
command
makes the program of interest immune to hangups.
To see this in action open up two terminals on your computer. In one of them
we will monitor the running processes using the command top
.
$ top
This should display a lot of information about the current running processes. To
make things a little easier to digest we can limit the output to the processes
owned by you. Press U
, which will prompt you for a user name. Enter your
user name and press enter. You should now only see the processes owned by you.
In the second terminal we will simulate a long running program using the
command sleep
, which simply pauses execution for a specified number of
seconds.
$ sleep 3600
In the first terminal, running top
, you should now see the sleep
program
running.
Now close the second terminal, the one in which you are running the sleep
command. Note that the sleep
program disappears from the top
display.
This is because the program was interrupted by the closing of the terminal.
Open a new terminal. This time we will prefix the sleep
command with nohup
.
$ nohup sleep 3600
Now close the terminal running the sleep
command again. Note that the sleep
command is still present in the top
display. It will keep running until it is
finished in an hours time.
ssh
command to login to remote machinesscp
command.ssh/config
filenohup
to ensure that long running processes are not killed by losing connection to the remote machine