Welcome to STA 250!
On the menu for today...
Coding warmups
Working remotely: ssh, sftp, scp etc.
Filesystem basics
Git + GitHub
Gauss: The Stat Cluster
Python basics
R basics
Notetakers for today: Christopher Aden, Xiongtao Dai, Shan-Yu Liu
Paul D. Baines
On the menu for today...
Coding warmups
Working remotely: ssh, sftp, scp etc.
Filesystem basics
Git + GitHub
Gauss: The Stat Cluster
Python basics
R basics
Notetakers for today: Christopher Aden, Xiongtao Dai, Shan-Yu Liu
Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".
(From: http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html)
SSH stands for "Secure Shell", and is a protocol that allows users to login to remote machines. For example, you can login to the Stat Dept cluster ("Gauss") while sitting having a cup of coffee in Australia.
We will use ssh
a lot in this course.
Poll: Who has used ssh
before?
SSH is great for logging into a remote machine, but you will need a mechanism to transfer files between your laptop/desktop and, say, Gauss.
Enter scp
and sftp
. This is where using Windows becomes painful.
To ssh
into a remote machine is nice and easy:
ssh pdbaines@gauss.ucdavis.edu
For more X11 forwarding:
ssh -X pdbaines@gauss.ucdavis.edu
For more debugging:
ssh -vX pdbaines@gauss.ucdavis.edu
Obviously login to your account on Gauss. Not mine.
For Windows users:
Dual-boot with Linux.
Only joking (sort of).
You can use a GUI tool such as:
WinSCP
: http://winscp.net/eng/download.php, or,ssh secure shell client
:scp
stands for secure copy, and allows you to copy files to/from your local machine
to a remote machine (e.g., Gauss). For example, to copy foo.txt
from my Desktop to
my "Research" folder on Gauss
scp ~/Desktop/foo.txt pdbaines@gauss.cse.ucdavis.edu:~/Research/
To copy directories you need to request -r
for recursive copying. For example,
to copy you Desktop
directory to Gauss:
scp -r ~/Desktop pdbaines@gauss.cse.ucdavis.edu:~/
For more, try man scp
or http://linux.die.net/man/1/scp.
rsync
is a useful tool for more complicated file transfers. For example,
suppose you are copying 100,000 files from your laptop to Gauss and the file
transfer fails midway (after 2 hours). With rsync
it is trivial to resume
the transfer and avoid recopying already copied files.
For example, to copy my "libraries" folder on my laptop to Gauss, ignoring any files that had been uploaded first time around:
rsync --ignore-existing --recursive -av libraries/ pdbaines@gauss.cse.ucdavis.edu:~/
You can also add --dry-run
if you want to see what files will be copied but not
actually do the copy.
For more, try man rsync
or http://linux.die.net/man/1/rsync.
By now you should all have setup your account on Gauss.
What is Gauss? http://wiki.cse.ucdavis.edu/support:systems:gauss#hardware
Lets use Gauss to explore ssh
, scp
and GitHub.
To login into Gauss you need to create public/private keypair. On Mac/Linux this is trivial:
# Make the key:
ssh-keygen -t rsa
This creates a public key (id_rsa.pub) and a private key (id_rsa)
that reside in the ~/.ssh
directory. You will need to email the key
to help@cse.ucdavis.edu to
get access to Gauss. Full instructions:
http://wiki.cse.ucdavis.edu/support:general:security:ssh#setup.
Once your account is ready, copy the public key to Gauss e.g.,
scp ~/.ssh/id_rsa.pub yourusername@gauss.cse.ucdavis.edu:~/
Now you should be able to login:
ssh yourusername@gauss.cse.ucdavis.edu
Since you will be logging in to Gauss many times, it is advisable to
create a script to save you some typing. Create a file, say,
gauss_ssh
and type:
#!/bin/bash
ssh -vX myusername@gauss.cse.ucdavis.edu
You may also need to change the permissions to make the script executable:
chmod u+x gauss_ssh
To run the script (and login to Gauss), just type:
./gauss_ssh
Nice. That saved a few keystrokes. Now you are a pro. If you are not already a proficient script writer, then I highly recommend becoming one!
It is frequently necessary to compress large files into smaller ones using standard tools. While the .zip
archives are familiar for Windows users, the recommend [un]compression tool for this class is tar
. Other standards such as .bz2
are also fine, and you may use your preferred approach. Again, on Windows it is usually easier to handle archives via a GUI such as WinZIP
.
To create a .tar.gz
archive from a single file on Mac/Linux:
tar -cvzf myarchive.tar.gz big_file.txt
To create an archive containing a whole directory:
tar -cvzf myarchive.tar.gz dir_to_compress/
To uncompress a .tar.gz
archive into the current directory:
tar -xvzf myarchive.tar.gz ./
When logged into Gauss you will frequently need to edit files.
Since Gauss provides now GUI, this can be tricky for first-timers.
You have two main options:
To open a file (e.g., foo.txt
) with nano
:
nano foo.txt
Note: If foo.txt
does not exist, it will be created (but not saved).
Just type/edit the file as you see it.
CTRL+X
(it will prompt for save)Nano is nice and easy to use (but not very powerful).
Vi is an old but popular text editor originally written for Unix in the 1970's.
It is fundamentally different from most text editors in that it has two distinct modes:
To enter insert mode press i
, to exit press Esc
.
When in "Insert mode", the bottom of the screen should read:
-- INSERT --
All of the following commands must be typed from normal mode, not insert mode:
:w
Save (i.e., write) to file:w foobar.txt
Save to file with filename foobar.txt
:q
Quit the program:q!
Quit without saving/foo
Search for the text 'foo' within the documentn
Repeat the search (jump to next occurence)G
Jump to last line of file:10
Jump to line ten of the fileyy
Copy the current line to the clipboard10 yy
Copy the next ten lines to the clipboardy$
Copy from the cursor to the end of the linep
Paste the contents of the clipboarddd
Delete the current line10 dd
Delete the next ten linesu
Undo the last command:%s/foo/bar/cg
Search and replace the text "foo" with "bar" globally, confirming first (hence the cg)See: Vi Cheat Sheet for more.
What is Git?
Why use git?
Nice intro videos: http://git-scm.com/videos
References: http://git-scm.com, http://git-ref.org
What is GitHub?
Why use GitHub?
Register for free use of 5 private repositories via an educational account at http://github.com/edu.
Pretty much everything you need to know:
git clone
git add
git commit -a
git push
git pull
git status
git diff
git merge
git mv
git rm
Useful references: http://gitref.org/basic/, https://help.github.com/articles/fork-a-repo
Lets see this in action...
When you login to Gauss, you are logging in to the "Head node". The head node is designed to manage the system, not do major calculations: that is what the compute nodes are for.
If you just type R
(or python
) at the command line, you will be running R
/python
on the
head node. Generally speaking, never do this.
To use the compute nodes on Gauss you need to submit a batch, or array, job.
The compute nodes on Gauss cannot be used interactively: you need to provide a script that will run, and make sure your script saves the output/writes to file to store the results.
In light of this non-interactive nature, it is helpful to develop code locally and make sure it is running properly, before running batch jobs on Gauss.
Please read the Wiki: http://wiki.cse.ucdavis.edu/support:systems:gauss
Check out: http://wiki.cse.ucdavis.edu/_media/support:systems:intro_to_gauss_slides.pdf
The Boot_Camp
folder of the course GitHub repo has an example of a python batch job on Gauss.
sarray boot_camp_demo.sh
squeue
scancel
Be sure to read the .err
and .out
files to check everything ran smoothly.
Finally: Copy everything from Gauss to your laptop.
Who is new to python? Who is planning to use it?
First, check you have python installed:
python --version
If not, then install it. To develop in python, there are a number of IDE's.
We'll spend more time on Python in future weeks.
To develop on your laptop in R
I recommend using RStudio (http://rstudio.org). It provides a comprehensive IDE,
and makes it easy to debug.
Who is new to R? Who is planning to use R?
Some resources for learning R: http://www.ats.ucla.edu/stat/r/
You could try: http://tryr.codeschool.com/
Or Google "R tutorial" or "R introduction" and pick your favourite guide.
We'll spend more time on R in future weeks.
http://cheezburger.com
Speaking of Twitter... http://what-if.xkcd.com/65/
Next Week: More Boot Camp + Bayes!