Computing clusters: Local Guide


Configuration  
Quotas
Support
      Access from VSCHT
  — Shell
  — File transfer
  — X-windows
Access from outside
  — Shell
  — File transfer
  — X-windows
      Getting started
  — Managing files
  — Editing text files
Managing jobs
  — Submit jobs
  — List jobs
  — Remove jobs
      Compiling and debugging
  — PGI (obsolete)
  — g77
  — The GDB debugger
Backups
  — Backup server
      cluster

Configuration

We have two separate clusters, old a324-2.vscht.cz (IP = 147.33.103.153 static) and new as67-1.vscht.cz (IP = dynamic, now 147.33.79.103). User home directory is /home/USER, user data directory is /data/USER.

On nodes (clients), a user can use scratch disk space in directory /scratch/USER/, where USER is user name. Scratch disks are nonperiodically cleared from older files. This event is announced in advance.

Configuration as of May 2009:

a324-2.vscht.cz (147.33.103.153) contains two filesystems:
  • /home = 246 GB of user space, accessible (via NFS) from all clients, periodically backed-up
  • /data = 158 GB of space, not available in clients and not backed-up
computerproc-
essor
co-
res
mem
[GiB]
scratch
[kiB]
normal
queue
nice
queue
a00 = a324-2Athlon24---
a03Athlon 1171 259 640aqa-1-1nqa-1-1
a04Athlon 1171 259 640aqa-1-1nqa-1-1
a08Athlon 24184 401 136aqa-2-4nqa-2-4
a09Athlon 24184 401 136aqa-2-4nqa-2-4
a10Athlon 24184 401 136aqa-2-4nqa-2-4
a11Athlon 24184 401 136aqa-2-4nqa-2-4
a12Athlon 24184 401 136aqa-2-4nqa-2-4
a13Athlon 24184 401 136aqa-2-4nqa-2-4
a14Athlon 24184 401 136aqa-2-4nqa-2-4
a15Athlon 24184 401 136aqa-2-4nqa-2-4
a16Athlon 24184 401 136aqa-2-4nqa-2-4
a20Athlon 22107 486 652aqa-2-2nqa-2-2
a21Athlon 22107 486 652aqa-2-2nqa-2-2
a22Athlon 22107 486 652aqa-2-2nqa-2-2
a23Athlon 22107 486 652aqa-2-2nqa-2-2
a24Athlon 22107 486 652aqa-2-2nqa-2-2
a25Athlon 22107 486 652aqa-2-2nqa-2-2
a26Athlon 22107 486 652aqa-2-2nqa-2-2
a27Athlon 22107 486 652aqa-2-2nqa-2-2
a28Athlon 22107 486 652aqa-2-2nqa-2-2
a29Athlon 22107 486 652aqa-2-2nqa-2-2
a30Athlon 22107 486 652aqa-2-2nqa-2-2
a31Athlon 22107 486 652aqa-2-2nqa-2-2
a32Athlon 22107 486 652aqa-2-2nqa-2-2
a33Athlon 22107 486 652aqa-2-2nqa-2-2
a34Athlon 22107 486 652aqa-2-2nqa-2-2

System administrator: Jiri Kolafa

as67-1.vscht.cz contains one filesystem:

  • /home = 4.5 TiB of user space, accessible (via NFS) from all clients. No backups are performed!

 

comp-
uter
proc-
essor
co-
res
mem
[GiB]
scratch
[kiB]
normal
queue
nice
queue
s01Opteron816544 089 632sq-8-16mq-8-16
s02Opteron816544 089 632sq-8-16mq-8-16
s03Opteron816544 089 632sq-8-16mq-8-16
s04Opteron816544 089 632sq-8-16mq-8-16
s05Opteron816544 089 632sq-8-16mq-8-16
s48Athlon 24184 401 136sq-2-4mq-2-4
s49Athlon 24184 401 136sq-2-4mq-2-4
s50Athlon 24184 401 136sq-2-4mq-2-4
s51Athlon 24(problem) sq-2-4mq-2-4
s52Athlon 24184 417 200sq-2-4mq-2-4
s53Athlon 24184 417 200sq-2-4mq-2-4
s54Athlon 24232 473 724sq-2-4mq-2-4
s55Athlon 24232 473 724sq-2-4mq-2-4
s56Athlon 24232 473 724sq-2-4mq-2-4
s57Athlon 24232 473 724sq-2-4mq-2-4
s58Athlon 24232 473 724sq-2-4mq-2-4
s59Athlon 24232 473 724sq-2-4mq-2-4
s60Opteron48368 836 168aqo-4-8nqo-4-8
s61Opteron48368 836 168aqo-4-8nqo-4-8
s62Opteron48368 836 168aqo-4-8nqo-4-8

System administrator: Dr. Polach (jiri.polach(at)marge.uochb.cas.cz)

Quotas

User disk space (/home) is limited by quotas. To check your quota status, use command:
quota -s
It is wise to put this command to your .login (if you are using tcsh) or .profile (if you are using bash). The quota value reported can be exceeded up to the limit, but not for more than the grace period of 7 days.

Support

This manual: http://www.vscht.cz/fch/en/research/cluster.html

Access from the VSCHT domain

Shell access

Access to the clusters is possible only via the Secure Socket Shell (ssh) and (directly and with X11 forwarding) only from computers inside the VSCHT domain. A user logs into the server a324-2.vscht.cz (147.33.103.153).

From Linux

ssh USER@a324-2.vscht.cz
(answer yes for adding a324-2.vscht.cz to your list of trusted hosts)

From Windows

There are several implementations of ssh for Windows. We recommend Putty. Start Putty, enter a324-2.vscht.cz as the host and select SSH as the service (the TCP port should be 22).

File transfer

From Linux

One option is scp:
to copy file /home/USER/MYFILE/ from cluster to your local ./:
scp USER@a324-2.vscht.cz:MYFILE .
to copy your local file file ./MYFILE to /home/USER/MYFILE on the cluster
scp MYFILE USER@a324-2.vscht.cz:MYFILE
Another option is sftp.

From Windows

One option is WinSCP, which provides a Windows commander-like or Explorer-like interface to transfering files. Start WinSCP.exe, enter the host a324-2.vscht.cz, then your name and the password. The TCP port should be 22.

X-windows (X11)

To use graphical applications like gnuplot, xxgdb, etc., on the cluster, you need an X11 server running on your machine.
Note: People are often confused by the client/server model of X11. A client (running on a remote machine, in our case a324-2.vscht.cz) asks the server (running, e.g., on your M$ Windoze PC) to display graphics (e.g., to draw a rectangle).

From Linux

Normally an X11 server is running and ssh with option -X provides transparent X forwarding:
ssh -X USER@a324-2.vscht.cz

In case of problems: You may need to add the client computer to the list of allowed hosts on your computer. Thus, on your computer, run:

xhost +a324-2.vscht.cz
Sometimes it may be also needed to set DISPLAY as above on a324-2.vscht.cz (see below).

From Windows

The recommended X server is XMing. To establish a connection, you must A running XMing is indicated by a small X in the right side of your task bar.
If anything gets wrong, try set the DISPLAY environment variable in your Putty shell:
setenv DISPLAY NAME:0.0     # in csh, tcsh
export DISPLAY=NAME:0.0     # in sh, bash
where NAME is the name of your computer, e.g., mycomp.vscht.cz or 147.33.103.16.
Note: An X11 session cannot be started automatically from Windows because rexec and rsh are disabled on the server (for safety reasons)

Access from outside

Shell access

The simplest option is to use the SSH gateway: use any ssh connection to ftpin.vscht.cz and log in as sshgw (mnemonics: SSH GateWay) with passwd=sshgw. Then, type the name of the target computer (a324-2), your user ID, and password. X11 forwarding is not supported; for connecting incl. graphics, see below.

From Linux

ssh sshgw@ftpin.vscht.cz
sshgw@ftpin.vscht.cz's password:
sshgw

Zadejte adresu systemu ke kteremu se chcete pripojit
>
a324-2
Zadejte jmeno uzivatele, pod kterym se chcete pripojit
>
USER
Probiha pripojovani... USER@a324-2's password: PASSWORD

From Windows

Use Putty as above with host=ftpin.vscht.cz and user=sshgw, password=sshgw. Then see above.

File transfer from outside

One option is to use the ftp server ftpin.vscht.cz, otherwise see below. You need a (temporary) account at ftpin.vscht.cz. From a shell at the cluster, run
telnet ftpin
login:
ftpman
Password: ftpman
Zadejte nove uzivatelske jmeno: USER (=new user name)
Zadejte uzivatelske heslo:PASSWORD
Zadejte uzivatelske heslo znovu(kontrola):PASSWORD (once more to check)
Kolik dni chcete ponechat ucet aktivni(maximum 7)?[1] DAYS_ACTIVE (account active max 7 days)
Zmacknete ENTER pro zalozeni uzivatele nebo ukoncete spojeni. ENTER
Then you can access USER@ftpin.vscht.cz from both the VSCHT domain and outside by your favorite ftp client.
Hint: put your login data into file .netrc, both on the cluster and on your Linux machine, e.g.
machine ftpin.vscht.cz login USER
password PASSWORD
macdef init
binary
 

VPN and X-windows (X11)

As soon as you get a VPN session to VSCHT established, you may connect (incl. X-forwarding) and send files directly from your remote PC. You need to install a VPN client, though, and this approach to some extent compromises security of your PC. For more info consult the official manual.

From Linux

Installation of a VPN client

Before first use on your home Linux computer, get snx_install.sh and install it by running sh snx_install.sh as a root.

Unfortunately, snx requires an old version of library (libstdc++2.10-glibc2.2). If this is not your default library, install it and do the following hack (as root):

cd /usr/bin
mv snx snx-bin
echo LD_PRELOAD\=libstdc++-libc6.2-2.so.3 /usr/bin/snx-bin \"\$\@\" > snx
chmod u+rsx snx
chmod go+x snx
Connect to VPN

Connect to VPN by command (as root):

snx -s 147.33.1.3 -u VSCHTUSER
and enter your VSCHT password when requested. (VSCHTUSER is your short login name to VSCHT domain.) Now you shoud receive an "Office Mode IP". The new interface is called tunsnx (check this by ipconfig).

To disconnect the VPN session, run as root:

snx -d

Alternatively, you can use a browser-based method (Java RTE is needed) similarly as in Windows (see below) [not tested].

Connect to cluster
As soon as you are connected by VPN to VSCHT, the usage is the same as if you are in the domain (ssh -X USER@a324-2.vscht.cz).
Back connection
You can access your home computer via the OFFICE_MODE_IP from computers in the VSCHT domain; e.g., you can transfer files from a computer (of "Kategorie 2") to home like
scp REMOTEFILE LOCALUSER@OFFICE_MODE_IP:LOCALFILE
scp LOCALUSER@OFFICE_MODE_IP:LOCALFILE REMOTEFILE
(At this moment, cluster is not "Kategorie 2" so that this direction does not work.)

From Windows

  1. In a browser, open https://147.33.1.3
  2. Check that your browser allows pop-up windows
  3. Accept the certificate and ignore messages of non-matching names
  4. If a pop-up window appears, log in as to the VSCHT network (use short username, not full name with a dot, and your e-mail password)
  5. You should receive an IP address (field "Office Mode IP")

Details may differ according to your browser.

As soon as you are connected by VPN to VSCHT, the usage is the same as if you are in the domain (Putty and XMing).

Getting started

A user is normally logged to the server (a324-2.vscht.cz) where (s)he can manage and edit files, compile and debug programs, submit jobs to be run on the client nodes, and analyse the results (incl. X11 graphics). No lengthy calculations are allowed directly on the server!

If necessary (e.g., lengthy interactive debugging), you may also use ssh to connect directly to machines inside the cluster. It is not allowed to jump the queue of submitted jobs in this way! The most important commands to survive are:

passwdChange your password
man COMMANDGet the manual page of COMMAND
xmanManual pages browsing tool, requires X11
infoComprehensive manual of GNU software
info COMMANDInfo on COMMAND (often more up-to-date than the man-page)

Managing files

Your shell is tcsh or bash. (To figure out which one, execute command ps.) To get help, use
man tcsh     or     man bash

It's pretty long, isn't it? A few basic commands, common for both shells, are listed below.

ls
ls DIRECTORY/
List files
ls -l
ls -l DIRECTORY/
List files with verbose info
less FILEView a text file; use arrows, PgUp/PgDn or u/space, quit by q
cp -i FILE1 FILE2
cp FILE(s) DIRECTORY/
Copy files. Asks for confirmation if the destination file is to be overwritten
mv -i FILE1 FILE2
mv FILE(s) DIRECTORY/
Rename or move files. Asks for confirmation if the destination file is to be overwritten
rm -i FILE(s)Remove files. Asks for confirmation

(Option -i in the above commands ensures confirmation if a file is about to be overwritten or erased; based on your environment, you may have aliased the above commands so that -i does not have to be used.)

Another possibility is to replace this shell by the Midnight Commander, a clone of the popular Norton Commander. It is started by:

mc

Editing text files

mc [F4]Internal editor of the Midnight Commander is invoked by typing [F4]
joe FILESimple text editor of the WordStar/Turbo family
emacs FILEPowerful but complicated text editor
vi FILEUNIX classical text editor (incomprehensible for Windows users)

Another possibility is to edit files locally (on your Windows) and to move them using WinSCP.

Managing jobs

A user batch job (binary executable or a script) is submitted on the server to a queue. As soon as there are resources available, the job is started on a client. There are two instances of queues: In addition, there is the local rule:

Submit jobs

Examples:

List jobs

Remove jobs

Compiling and debugging

PGI (obsolete)

The PGI optimizing and parallelizing FORTRAN 77/90 and C/C++ compilers w. debugger have been installed on the old cluster. The manual is invoked by command
netscape $PGI/doc/index.htm
provided that you have an X11 server running, or in the text mode (with much less user comfort) by
lynx $PGI/doc/index.htm

To compile and debug a simple (one-module) FORTRAN program for use on one processor, use:

pgf77 -g myprog.fCompile FORTRAN 77 program for debugging. Output executable is a.out
pgdbg a.outDebug a.out
Xpgdbg a.outDebug a.out using a graphical interface to gdb -- X11 is needed
pgf90 -O2 -o myprog myprog.fCompile FORTRAN 90 program for final run (maximum optimization). Output executable is myprog

g77

Note: The PGI compiler usually gives faster code (by 20-30%) and is therefore recommended.

To compile and debug a simple (one-module) FORTRAN program, use:

g77 -g myprog.fCompile FORTRAN program for debugging. Output executable is a.out
gdb a.outDebug a.out -- see below
xxgdb a.outDebug a.out using a graphical interface to gdb -- X11 is needed
g77 -O3 -ffast-math -o myprog myprog.fCompile FORTRAN program for final run (maximum optimization). Output executable is myprog

For programs of consisting of several modules, read the manual pages for g77 and make.

The GDB debugger

Call as
gdb PROGRAM
If your program has crashed and dumped the core (file core), you can perform a post-mortem analysis by
gdb PROGRAM core
To debug an already started program, use
gdb PROGRAM PID
where PID is a process number that can be obtained by top or ps.

A few survival commands of gdb follow

SynopsisabbrExplanation
break FILE:LINE
break FUNCTION
bSet breakpoint (to the beginning of line)
contcContinue execution
del [NUMBER]dDelete breakpoint(s)
help [COMMAND]hGet help (on COMMAND)
listlList several lines up and below in the source
nextnExecute next line incl. all functions
print [EXPRESSION]pPrint the value of variable or expression
run [ARGS]rStart program (w. command line ARGS)
stepsExecute next line; if there is a function or procedure call, step into it.
tbreak FILE:LINE
tbreak FUNCTION
tbSet temporary breakpoint (deleted after use)
watch [EXPRESSION]waBreakpoint if expression changes. Very slow!
what [EXPRESSION]whaType of EXPRESSION
wherewheTo see where in the program you are
Ctrl-C Interrupt running program

Note: EXPRESSION is either one variable or a valid expression in the syntax of the language being debugged.

Backups

Backup server

Backups of a324-2 are performed daily at night on a325-11 ("the backup server") using rsync. All user home directories (/home/*/) are included. Four daily backups are kept, and weekly backups (Saturday to Sunday) up to the backup server capacity. Older files are lost.

Currently only root can access a325-11; if you need to recover a damaged or lost file, please let me know.

There is no backup service for as67-1. Any removed file is lost.