Kim Lab of Computational Evolutionary Biology
Public Private Project1 Project2 Project3 Project4 Archive



Home

People

Projects

Publications

Downloads

Cluster







Biology Department
School of Arts and Sciences
University of Pennsylvania
103I Lynch Laboratory
433 S University Avenue
Philadelphia, PA 19104 USA

off: (215) 746-5187
lab: (215) 898-8395
fax: (215) 898-8780

email: junhyong@sas.upenn.edu

Index - Changes - Edit - Delete - Search: 

UsagePolicy


Computer Access

In order to best accommodate all users in an equitable fashion, we've devised the following computer access policy. Please address questions or comments on our policy to StephenFisher.


Application Restrictions

  • VNC: should only be run on kimclust15.
  • Mathematica: is only licensed to run on kimclust40.


Machines

  • kimclust11: MySQL? and file server - not to be used for user jobs.
  • kimclust12: file server - not to be used for user jobs.
  • kimclust14: file server - not to be used for user jobs.
  • kimclust15: file server, NIS server, VNC server - not to be used for user jobs.
  • kimclust40: Mathematica and MatLab? server - avoid using all CPU for non-Mathematica / MatLab? jobs.

  • kimclust[32 - 40]: workstations - use these machines for all jobs.


Disk Partitions

Contact Stephen if you need access to any specific data partition(s). You can use 'df -h' to view the amount of storage on all disk partitions.

  • /home (aka /home15): Resides on kimclust15 and should not contain large data sets.
  • /data15: Resides on kimclust15 and should be used to store large data sets.
  • /data14: Resides on kimclust14 and should be used to store large data sets.
  • /unsafe14: Resides on kimclust14 and should be used for temporary storage. This drive is configured to be faster than the other data drives but also doesn't have the same level of redundancy as the other data drives and may incur data loss if the disk is being accessed when a network outage or computer reboot occurs. The OS will not update the access time on files accessed via NFS.
  • /data12: Resides on kimclust12 and should be used to store large data sets.
  • /lab: Resides on kimclust12 and is the location of the lab repository. The OS will not update the access time on files accessed via NFS.
  • /genomics: Resides on the PGFI SAN. Access these same files on the PGFI SAN by going to /gpfs/fs0/l/kim/data.
  • /genomics.unsafe: The same as '/genomics' but uses a faster network connection. However if there are network outages while writing data to this partition, you may incur data loss.


64bit Computing

All workstations are set up for 64bit computing. These machines are capable of running 32bit or 64bit applications. To make use of the 64bit processors you need to make sure your application is compiled using a 64bit compiler, which is the default compilers on these machines. If you require 32bit applications, the 32bit version of /usr/local is located at /usr/local32. In order to access these 32bit programs on the 64bit machines, you must reference them via /usr/local32/bin/<...>.


Load Balancing

Currently the best way to run jobs across the cluster is to ConfiguringSSH to allow passwordless logins and then to use ssh to remotely launch jobs. Since this will not balance the loads across the cluster, you can view the machine loads with the "cuptime.pl" or "cup.pl" commands as described in CurrentUsage.

In the following example, the user is logged into kimclust11 and uses ssh to launch the command 'date' on kimclust38. The output from the command will be displayed in the current terminal window.

    [fisher@kimclust15]$ ssh kimclust38 date


Nice

The 'nice' command can be used to change the priority of a command while it is running. With larger nice levels, fewer the CPU cycles are allocated to the job, thus giving more priority to other jobs with lower nice levels. 19 is the maximum nice level. The syntax of the nice command is as follows.

    [fisher@kimclust15]$ nice -n 19 "your command here"

The following example uses the nice command to run 'date' on kimclust38 with a nice level of 19.

    [fisher@kimclust15]$ ssh kimclust38 nice -n 19 date


What machine should I use to run my job?

First off, do not ever run jobs on the servers, unless the job is only doing file management. For example, if you have a large directory of files on /data14 that you want to tar, then log into kimclust14 to do the tar'ing. Otherwise you will be pulling the files across the network to your current machine, tar'ing then and then sending them back across the network to kimclust14.

If you are doing anything else, pick one of the cluster workstations (see above). Run cup.pl to see who's using what machines and ideally pick a machine not currently being used (ie unloaded). If all machines are being used, look at the number of cpu on a machine and the load. Each cpu can handle a load of 1 and each job running on a machine adds 1 to the load (ie a 4 cpu machine can handle 4 jobs or a load of 4). While you can run more jobs on a machine, they will hinder the other jobs running on the machine, unless you set the 'nice' value significantly lower than the existing jobs (see above). Setting the nice value lower, will also mean your job will run slowly.

Check out the amount of RAM on the machines (also shown in cpu.pl). If your job is RAM intensive, then that may be a more important metric than the load and you need to pick a machine accordingly. Most machines have a swap drive that is equal to the amount of RAM shown by cup.pl. So if cup.pl shows a machine with 16 GB of RAM, then it's likely the machine has another 16 GB of swap space (you can view the amount of swap using 'top'). However, if a exceeds the built-in RAM and uses swap space then it's going to run much, much slower, unless the program was specifically designed to use swap space. If the program tries to use more memory than RAM + swap space, the program will be killed by the kernel.

Lastly, if your program is repeatedly reading the same files or saving a lot of temporary data to disk that it's going to read back in and process, consider using the local drive on the machine. Most machines have large, local hard drives. Use these for temporary files, that you can delete once your program finishes running. It will allow your program to run faster and reduce the load on the rest of the network. You can view the amount of local disk space using cup.pl and access the disk space by creating a directory in /tmp. Note that you can not access files in /tmp on any other computer. Also, all files in /tmp are automatically deleted every time the computer reboots, so plan accordingly.