Cockcroft Institute Condor Pool

This static information was updated 3/3/2009. The pool is currently maintained by Jonny Smith.

We would like to invite you to join a Cockcroft condor system. A system which would make available your un-used desk top computational power to fellow scientist and engineers in Cockcroft doing computationally intensive calculations!

Background

Within the Cockcroft Institute there are a number of activities which are computationally very demanding. These include simulations that track particles around the NLS, microbunching studies, calculations of RF modes in cavities, and wakefield calculations. We determined that in order to achieve our goals we needed to expand the computing hardware available to us, especially as we foresee our computational requirements growing. We have negotiated access to a number of large clusters of machines in CSE, Lancaster, Liverpool and Manchester, but we have been encouraged to make full use of those resources already available to us, and identified the best solution with help from the e-science department of STFC.

What is Condor?

Quite often our desktop computers sit idle when we are not using them, and have no computations running on them. Condor is an application that takes these spare computing cycles, and makes them appear as a cluster for our own use. It is configured so that if a condor managed task is running and you return to your computer, this task will completely vacate your computer and find somewhere else to run.

I have successfully run ABCI, a wakefield calculation program, and others various test programs on a collection of computers using condor as a test, and various others are looking at it managing a queue of Opera jobs, particle tracking simulations, etc. We are now upgrading to a 'production environment'. By this I mean encouraging everyone with a computer in the CI on the DL internal network to join, although I should add you will need to be an Administrator on the computer for the setup to work properly. Unfortunately computers on the visitor network do not have the right access through the firewall.

condor_status -pool ci-condor.dl.ac.uk

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

ci-condor.dl. LINUX       INTEL  Unclaimed  Idle       0.000  1009  0+01:25:04
slot1@dlccrof LINUX       INTEL  Owner      Idle       1.000   500  0+22:47:47
slot2@dlccrof LINUX       INTEL  Owner      Idle       2.440   500  0+10:06:17
slot1@dlccrof LINUX       INTEL  Owner      Idle       1.000  2024  4+22:12:35
slot2@dlccrof LINUX       INTEL  Unclaimed  Idle       0.000  2024  0+06:25:09
dlccroft3.dl. LINUX       INTEL  Unclaimed  Idle       0.000  1002  0+05:28:22
slot1@dlccrof LINUX       INTEL  Unclaimed  Idle       0.190   504  0+08:00:08
slot2@dlccrof LINUX       INTEL  Unclaimed  Idle       0.000   504102+22:16:02
slot1@kvg9122 LINUX       INTEL  Owner      Idle       0.400  1961  0+05:05:08
slot2@kvg9122 LINUX       INTEL  Unclaimed  Idle       0.000  1961  0+12:05:16
slot1@ycg3488 LINUX       INTEL  Unclaimed  Idle       0.030   492  0+08:20:09
slot2@ycg3488 LINUX       INTEL  Unclaimed  Idle       0.000   492  0+23:15:28
jda23eve1.dl. LINUX       X86_64 Owner      Idle       1.000  3017  6+18:22:13
slot1@DLCCROF WINNT51     INTEL  Owner      Idle       0.550  1022  0+05:00:07
slot2@DLCCROF WINNT51     INTEL  Owner      Idle       0.000  1022  0+05:10:08
apws16.dl.ac. WINNT51     INTEL  Unclaimed  Idle       0.000  1023  0+05:00:03
slot1@apws24. WINNT51     INTEL  Owner      Idle       1.000  1663  1+00:30:21
slot2@apws24. WINNT51     INTEL  Owner      Idle       1.000  1663  1+00:30:22
slot1@djd63vi WINNT51     INTEL  Owner      Idle       0.570  1023  0+06:10:08
slot2@djd63vi WINNT51     INTEL  Owner      Idle       0.000  1023  0+06:10:09
slot1@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.000  1790  0+06:20:08
slot2@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.000  1790  0+06:40:10
slot1@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.000  1790  0+11:25:16
slot2@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.010  1790  0+08:50:12
slot1@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.000  1658  0+19:30:25
slot2@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.000  1658  0+07:30:09
slot2@dlccrof WINNT51     INTEL  Unclaimed  Idle       0.000   510  0+05:35:09
slot1@dlccrof WINNT51     INTEL  Owner      Idle       0.000  1663  0+18:40:11
slot2@dlccrof WINNT51     INTEL  Owner      Idle       0.210  1663  0+19:10:12
slot1@gcb53vi WINNT51     INTEL  Unclaimed  Idle       0.010  1535  0+05:20:08
slot2@gcb53vi WINNT51     INTEL  Unclaimed  Idle       0.010  1535  0+05:40:13
slot1@jac93vi WINNT51     INTEL  Owner      Idle       0.000  1010  0+05:25:08
slot2@jac93vi WINNT51     INTEL  Owner      Idle       0.030  1010  0+05:35:09
slot2@lbj37vi WINNT51     INTEL  Owner      Idle       0.000  1662  0+05:15:08
slot1@pg45vig WINNT51     INTEL  Owner      Idle       0.310   511  0+05:05:08
slot2@pg45vig WINNT51     INTEL  Owner      Idle       0.000   511  0+05:05:09
slot3@pg45vig WINNT51     INTEL  Owner      Idle       0.000   511  0+05:05:10
slot4@pg45vig WINNT51     INTEL  Owner      Idle       0.000   511  0+05:05:11
slot1@rfsim1. WINNT51     INTEL  Unclaimed  Idle       0.000   383  0+06:55:08
slot2@rfsim1. WINNT51     INTEL  Unclaimed  Idle       0.000   383  0+17:00:27
slot3@rfsim1. WINNT51     INTEL  Unclaimed  Idle       0.000  2302  0+22:15:34
slot1@rfsim2. WINNT51     INTEL  Unclaimed  Idle       0.000  3070  0+05:45:07
slot1@rfsim4. WINNT51     INTEL  Claimed    Busy       0.010  1534  5+00:06:01
slot2@rfsim4. WINNT51     INTEL  Unclaimed  Idle       0.280   511  0+05:09:52
slot1@rf64sim WINNT52     INTEL  Unclaimed  Idle       0.020  65533  0+08:40:07

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX    12     4       0         8       0          0        0
       INTEL/WINNT51    31    15       1        15       0          0        0
       INTEL/WINNT52     1     0       0         1       0          0        0
        X86_64/LINUX     1     1       0         0       0          0        0

               Total    45    20       1        24       0          0        0

Up to 60 cores may be available at times.

ASTeC Orion-Galaxy Cluster

A 96-core x86 cluster, mainly 1GHz processors running Linux. For access contact Jonny Smith e-mail jonathan.da.smith@stfc.ac.uk

Linux Condor Instructions (Scientific Linux preliminary)

Get the RPM - there's a RHEL5 ones at astecnas/jda23/escience/condor-7.0.5-linux-x86-rhel5-1.i386.rpm astecnas/jda23/escience/condor-7.0.5-linux-x86_64-rhel5-1.x86_64.rpm depending on your flavour.

As root do "yum localinstall condor...rpm"

setup firewall - condor we're set for ports 9614, 9618 and 65000-65255 all both TCP and UDP. This can be done through the GUI. Some users had firewalls on SL, others did not.

setup condor user and group - can be done again using GUI, or just run the following as root

/usr/sbin/groupadd -g 14168 condor
/usr/sbin/useradd -g 14168 -u 14168 condor
#You may have a better way.
#There is a sample condor_config and condor_config.local which should work already set up so ...
scp youruser@apsv3.dl.ac.uk/scratch/jda23/escience/condor_config1 /opt/condor-7.0.5/etc/condor_config
#and you'll have to replace $hostname with whatever the rpm has set up in the commands below.
scp youruser@apsv3.dl.ac.uk/scratch/jda23/escience/condor_config.local1 /opt/condor-7.0.5/local.$HOSTNAME/condor_config.local
#deal with permissions in these directories by going
cd /opt/condor-7.0.5/local.$HOSTNAME
chown -R condor:condor  *

#This should basically do it, but to be complete we'd like to have condor start as a service. 
# There is an example init.d file in /opt/condor-7.0.5/etc/examples/condor.init

cp etc/examples/condor.init /etc/rc.d/init.d/condor
chmod a+x /etc/rc.d/init.d/condor

# put the executable somewhere where it might be expected.
ln -s /opt/condor-7.0.5/sbin/condor_master /usr/sbin/condor_master

# and the configuration
ln -s /opt/condor-7.0.5/condor.sh /etc/sysconfig/condor

There's almost certianly a better way of adding it to the appropriate runlevels than this, but it's what I've done so far...
ln -s /etc/rc.d/init.d/condor /etc/rc.d/rc5.d/S96condor

(there's a tool to add an S to levels 2345 and kill to 0,1 and 6, no? service add condor or something?)

At this point you probably want to ensure users and root have this file sourced in .bashrc (or .cshrc) so edit the .bashrc with source /opt/condor-7.0.5/condor.sh

system-config-services should list condor and be able to start it, although it reports as dead even when it's on.

Watch out for the following:

Condor is particular about the /etc/hosts file. Requires a more conventional layout rather than the SL default. I think the default puts the system name against 127.1.0.0 rather than the system network IP address, so this may require modification to the static IP of the host, otherwise the client will tell the central host to look at 127.0.0.1 (which would be the central server) to send messages to in response to adverts, rather than the client which needs the information.

Windows XP Installation

Installation should take about 5 minutes on an average PC. Any user comfortable with running things from a command line should be able to do this themselves, but if not please get in touch Jonny Smith, e-mail: j.d.smith@lancs.ac.uk and I'll see if I can do the install myself. The more people donate their unused computer cycles the better the resource is for others. More details on the Condor system can be found here: http://www.cs.wisc.edu/condor/

Instructions for setting up Condor on Windows XP for users on the Daresbury campus network - Linux and Mac users please email me as alternative solutions exist.

Click on the start button and select "run" from the menu type "cmd" in the box Copy

\\astecnas\users\jda23\escience\condor-7.0.5-winnt50-x86.msi

to the clipboard (ctrl-C).

right click on the command prompt window and select paste

A graphical installer should start up.

On the first screen, after accepting the terms and conditions, choose join existing condor pool, and enter ci-condor.dl.ac.uk as the hostname. Apart from 'start condor service after installation', which you need to set to "NO", which is about the last option, it shouldn't matter which options you select on the subsequent pages of the installer as we will overwrite the configuration file with one with all the right settings.

You'll be presented with a choice of Custom or Install, select Install to install condor in its default loaction.

When it's finished putting files where they need to go, click finish.

Copy the following line and paste into the command prompt window.

copy \\astecnas\users\jda23 escience\condor_etc\condor_config c:\condor\condor_config

This updates the configuration settings to those we are using for the Cockcroft Condor pool.

Then type this line on your command prompt.

net start condor

You have now installed condor and started the service. You should close the window.

If you wish to test it, I would encourage you to add the executable files to your system PATH variable. This is found by right clicking on 'My computer'->properties->Advanced(tab)->Environment variables, scroll the system variables window down to PATH and click on it then click edit under the box at the end add ;c:\condor\bin Now open another command prompt as you did before. Try typing "condor_status" this should tell you about the status of all nodes currently in the pool (as listed above). You can also see which ones would be available for running jobs at the moment with "condor_status -avail". You can sort these by those with the fastest processors: "condor_status -avail -sort Mips".

Various guides to submission resources are available on the Web

There are also some links on the NW-GRID portal here: http://rhine.dl.ac.uk:8080/portal Please contact Rob Allan to get an account, e-mail: robert.allan@stfc.ac.uk

Topic revision: r3 - 05 Mar 2009 - 07:53:45 - RobAllan
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback