RMCS Input File Directives

In deploying version 1.3 we discovered a small number of bugs that were related to the way input file parsing is performed. I suspect these will continue to crop up unless and until we write a simplified parsing layer. This should also simplify the input file format and remove some of the worst inconsistencies. This page is intended to describe exactly what the input file should look like - and how the parser should deal with it. In this case, if we get the documentation right then the coding should be easy. And we will get up-to-date documentation for free!

Input file format

The format of the my_condor_submit input file is heavily based on the condor input file format and in fact, many of the input lines come directly from condor_submit input files. The only difference between a condor_submit input file and a my_condor_submit input file is that the my_condor_submit input file can take a few extra input lines. All lines recognised by my_condor_submit are listed below and the context in which they can be used described in the following sections. Any line specified that is not recognised by my_condor_submit will cause a warning to be given.

Job definition commands

!ExecutableType

Specifies if the job is an MPI or serial executable. Note that it is likely that some resource job managers contain bugs when used for single processor MPI jobs or multi-processor "serial" jobs (to maximise memory or for executables that manage there own inter-thread communication).

  • Number allowed: One
  • Required?: No
  • Default: Dependant on NumberOfProcessors, if this is set to 1 then defaults to "serial", if this is > 1 then this defaults to "MPI".
  • Line format:

ExecutableType = serial | MPI

Arguments

Notes:

  • Arguments input by this method replace any arguments from the globusRSL line.
  • This is parsed as a string.
  • This should be fixed when we rework our parser.

This specifies any command line arguments needed to run the job.

  • Number allowed: One
  • Required?: No
  • Default: No extra pre script
  • Line format:

Arguments =

!ExtraPreScript

Runs a user specified command or script at the end of the preScript.

  • Number allowed: One
  • Required?: No
  • Default: No extra post script
  • Line format:

ExtraPreScript = pwd > extraout.txt

or

ExtraPreScript = perl -pi -e 's/\r\n/\n/;' dosfile1 dosfile2 ; echo "done dos2Unix"

!ExtraPostScript

Runs a user specified command or script at the start of the postScript.

* Number allowed: One * Required?: No * Default: No arguments passed to job * Line format:

ExtraPostScript = ./debugAgentXPostAgent superdebug

Resource selection commands

These four tags allow the user to select subsets of the machines in Seagul to submit to. Three of the tags are new in version 1.4. The interaction is possibly non-obvious - it's a logical AND for all four tags, with the proviso that the defaults are to exclude noting and include all machines and all grids. Note that in addition to these commands, it is possible for the grid administrators to disable machines for all users (for example, to prevent failing machines from being used). MCS will exit with an error if no machines can be found according to the preferences expressed using the tags below.

preferredMachineList

This line is used to specify a list of resources to metaschedule to.

  • Number allowed: One
  • Required?: No
  • Default: All known active machines
  • Line format:

preferredMachineList = machineName machineName ...

preferredGridList

This selects a subset of machines to schedule to.

  • Number allowed: One
  • Required?: No
  • Default: All known grids
  • Line format:

preferredGridList = nw-grid, ngs, camgrid

excludedMachineList

Lists machines not to be submitted to.

  • Number allowed: One
  • Required?: No
  • Default: Exclude no machines
  • Line format:

excludedMachineList = ,

excludedGridList

This excludes a subset of machines from consideration for scheduling.

  • Number allowed: One
  • Required?: No
  • Default: Exclude no grids
  • Line format:

excludedGridList = ,

Data Staging and Retrieval

The following lines all relate to staging of data and executables to the execute machine or retrieval of program output from the execute machine.

pathToExe

This specifies the SRB path to the 'Executable'. This path is not the SRB full path to the executable. The 'architecture' string of the machine the job runs on is appended to the pathToExe at job run time. For example if

Executable = ossia.x pathToExe = /ngs/home/joe-bloggs.ngs/test preferredMachineList = vidar.ngs.manchester.ac.uk-serial ngs.rl.ac.uk-serial

and the job runs on ngs.rl.ac.uk-serial, then RMCS will look to upload and run the executable in SRB at

/ngs/home/joe-bloggs.ngs/test/linux-64-serial/ossia.x

The mapping from machinename (eg ngs.rl.ac.uk-serial) to architecture can be found in the 'architecture' column of the MCS Grid Hosts table.

  • Number allowed: 1
  • Required?: Yes
  • Default: N/a
  • Line format:

pathToExe =

Sdir

Suggested change

  • Remove "S" from command name
  • Sdir to remain as a synonym with warning.

This specifies a collection to get files from or to upload files to within the SRB as part of the job submission.

  • Number allowed: As many Sdir lines as wanted
  • Required?: Yes if you want to transfer any data
  • Default: N/a
  • Line format:

Sdir =

Sdirect

This specifies whether data transfer to / from the SRB should be directly between the execute machine and the SRB vault. Direct transfer leads to much improved performance but requires extra firewall holes. Set this to false if you are unable to transfer directly between your chosen execute resource and all of the SRB vaults. All eMinerals vaults and execution resources allow direct transfers.

  • Number allowed: One
  • Required?: No
  • Default: true
  • Line format:

Sdirect = true|false

Sforce

This line specifies whether to overwrite local / SRB files when getting / putting files. A value of `true' will allow overwriting and `false' will not allow overwriting. Note a value of `false' will cause my_condor_submit to fail with an error if files being retrieved / uploaded already exist.

  • Number allowed: One
  • Required?: No
  • Default: false
  • Line format:

Sforce = true|false

Sget

Suggested change

  • Allow any number per Dir block.
  • Remove "S" from command name
  • Sget to remain as a synonym with warning.
  • Better specification of wildcard arguments
  • Expansion of wildcards at submit time

This specifies a list of files to retrieve from the previously specified collection within the SRB at the start of the submitted job. Note wildcards ( *) are now properly supported and can be used as they would be with any Linux etc. command line command. Also, recursion is also allowed (i.e. subdirectories are downloaded) if a related Srecurse line is specified for the continaing Sdir line.

  • Number allowed: Any number per Sdir line
  • Required?: Only if you want to download files from the SRB at the start of the run
  • Default: N/a
  • Line format:

Sget = file1,file2, *.psf,...

Files will only be retrieved recursively if used in conjunction with the SRecurse line described below. Shome

This line is used to specify the location of the Scommands on the machine on which the Sput / Sget commands are called. You need not specify this line if the Scommands are in /home/srbusr/SRB3_3_1/utilities/bin.

  • Number allowed: One
  • Required?: Only if the Scommands are not installed in the expected location on the remote machine
  • Default: /home/srbusr/SRB3_3_1/utilities/bin
  • Line format:

Shome =

Sput

Suggested change

  • Allow any number per Dir block.
  • Remove "S" from command name
  • Sput to remain as a synonym with warning.
  • Better specification of wildcard arguments

This specifies a list of files to put into the previously specified collection within the SRB at the end of the submitted job. Wildcards ( *) are now probably supported and can be used in the same manner as with normal Linux etc. command line commands. Directories can be uploaded recursively when used in conjunction with the SRecurse line described below.

  • Number allowed: Any number per Sdir line
  • Required?: Only if you want to upload files back to the SRB at the end of the run
  • Default: N/a
  • Line format:

Sput = file1,file2,*.extension,...

SRecurse

This line specifies whether to recursively upload / download files to / from the SRB. Used in conjunction with wildcards in Sget / Sput commands.

  • Number allowed: One per Sdir line
  • Required?: No
  • Default: false
  • Line format:

Sforce = true|false

!PerArch

Turns on archetecture specific download / upload for this dir block.

  • Number allowed: One per Sdir line
  • Required?: No
  • Default: false
  • Line format:

PerArch = true|false

Metadata management

The following lines all relate to obtaining and uploading of metadata to the eMinerals metadata database. It is worth noting that metadata parameters are limited in length (currently to 50 characters for the value and 30 for the name). MCS will detect cases where this limit will be exceeded and attempt to warn the user to minimise the risk of loss of data integrity. This warning is achieved by inserting "**TRUNCATED DATA**" at the end of the stored string in the database and writing a warning to out.err which includes the original (un-truncated) string.

AgentX

This line is used to instruct my_condor_submit to collect other data values from within a CML file and store them as metadata. The annotation will be created with the name as specified as the part of the line and will be retrieved from the file specified by the part of the line. The value will then be selected by evaluating the path specified by the rest of the line. A full description of this evaluation is given below.

  • Number allowed: As many as desired per Sdir line
  • Required?: No
  • Default: N/a
  • Line format:

AgentX = , :

AgentXDefault

This line is used to instruct my_condor_submit to extract metadata from a specified CML file. The metadata extracted will consist of all of the parameters within the first parameterList element within the CML file - this will typically consist of simulation input parameters. Also all of the metadata elements within the first metadataList will be extracted. All metadata extracted will be stored as annotations on the created data object. In addition an attempt is made to locate a UUID stored in the file. If this is found and it passes (partial) validation then this is stored in the database. Otherwise a null UUID (00000000-0000-0000-0000-000000000000) is stored.

  • Number allowed: One per Sdir line
  • Required?: No
  • Default: N/a
  • Line format:

AgentXDefault =

AgentXHome

This line is used to instruct my_condor_submit as to where AgentX should look for its mappings and ontology if the default location is not to be used. This location must have the same directory structure as that seen at the default location.

AgentXHome =

AgentXLibs

This line is used to instruct my_condor_submit as to where AgentX is installed on the execute machine if not in the default location or in a location that my_condor_submit does not know about.

  • Number allowed: One
  • Required?: No
  • Default: /usr/share/AgentX/perl
  • Line format:

AgentXLibs =

GetEnvMetadata

This line is used to instruct my_condor_submit as to whether or not it should collect metadata regarding the submission and execution environments of the jobs which will then be stored within the metadata database. All metadata collected will be stored as annotations on the created data object.

  • Number allowed: One per Sdir line
  • Required?: No
  • Default: false
  • Line format:

GetEnvMetadata = true|false

MetadataString

This line is used to instruct my_condor_submit to store a specified string of metadata with a specified name within the created metadata data object. The string will be given the name as specified by and will have value as specified by .

  • Number allowed: As many as desired per Sdir line
  • Required?: No
  • Default: N/a
  • Line format:

MetadataString = ,

RdatasetID

This line is used to specify the ID of a dataset to contain the created data object which will in turn contain all of the collected metadata. This line must be used instead of the RStudyID and DatasetName lines.

  • Number allowed: One (If no RStudyID or RDatasetName lines specified)
  • Required?: Yes, if RstudyID line not specified and metadata to be collected
  • Default: N/a
  • Line format:

RDatasetID =

RDatasetName

This line is used to specify a string to be used as the name of a created dataset to contain the created data object which will in turn contain all of the collected metadata. This line must be used in conjunction with the RStudyID line instead of the RDatasetID line.

  • Number allowed: One (If no RdatasetID line specified)
  • Required?: Yes, if RdatasetID line not specified and metadata to be collected
  • Default: N/a
  • Line format:

RDatasetName = "name for the created dataset"

RDesc

This line is used to specify the name to be given to the created data object within the metadata database. A data object with name equal to this line and URL equal to the preceeding Sdir line will be created to contain all harvested metadata.

  • Number allowed: One per Sdir line
  • Required?: Yes, if metadata to be collected
  • Default: N/a
  • Line format:

Rdesc = "String to store as a name for a created data object"

RHome

This line is used to instruct my_condor_submit as to where the RCommand binaries are installed if they are not in the default location or a location that my_condor_submit already knows about.

  • Number allowed: One
  • Required?: No
  • Default: /usr/bin
  • Line format:

RHome =

RStudyID

This line is used to specify the ID of a study in which to create a dataset to contain the created data object which will in turn contain all of the collected metadata. This line must be used in conjunction with the RDatasetName line instead of the RDatasetID line.

  • Number allowed: One (If no RdatasetID line specified)
  • Required?: Yes, if RdatasetID line not specified and metadata to be collected
  • Default: N/a
  • Line format:

RStudyID =

Metascheduling

The following lines all relate to metascheduling across the eMinerals minigrid resources.

jobType

This line is used to specify the type of job being submitted which must be either `performance' or `throughput'. Choosing `performance' results in the job being submitted to a cluster machine while choosing `throughput' will submit to a condor pool.

  • Number allowed: One
  • Required?: No
  • Default: performance
  • Line format:

jobType = performance|throughput

numOfProcs

This line is used to specify the number of processors to be used on the remote machine

  • Number allowed: One
  • Required?: No
  • Default: 1
  • Line format:

numOfProcs = Number

pathToExe

See here

Standard Condor Tags

The following lines are all standard condor input file tags that my_condor_submit understands and will accept as part of its input file.

Error

This line is used to specify the name of the file to which stderr should be redirected for the main part of the submitted job i.e. the stderr from the actual job execution rather than data-staging sections of the submission.

  • Number allowed: One
  • Required?: No
  • Default: job.err
  • Line format:

Error = filename

Executable

This line is used to specify the name of the executable to be run for the main part of the submitted job i.e. the the actual job execution rather than data-staging sections of the submission.

  • Number allowed: One
  • Required?: Yes
  • Default: N/a
  • Line format:

Executable = filename

GlobusRSL

This line is used to specify a additional arguments etc to the main part of the submitted job. Can be used to specify stdin, stdout and stderr for the main section of the job if desired

  • Number allowed: One
  • Required?: No
  • Default: N/a
  • Line format:

GlobusRSL =

  • Example:

GlobusRSL = (stdin=file.in)(stdout=file.out)(arguments=-f example_argument)

GlobusScheduler

This line is used to specify a particular machine and jobmanager to submit to and can only be used when not meta-scheduling. This line can be used to submit to a machine that my_condor_submit does not know about as long as the specified jobmanager is one which my_condor_submit supports.

  • Number allowed: One
  • Required?: No
  • Default: N/a
  • Line format:

globusScheduler = /

Input

This line is used to specify the name of a file to be used for stdin for the main part of the submitted job. Can be used instead of the (stdin) section of the globusRSL line.

  • Number allowed: One
  • Required?: No
  • Default: N/a
  • Line format:

input = filename

Log

This line if specified will be ignored by my_condor_submit which will instead use the default value.

  • Number allowed: One
  • Required?: No
  • Default: job.log
  • Line format:

Log = filename

Notification

This line is not currently supported by the NGS RMCS server pending debug.

This line is used to specify whether you want condor to notify you of the status of the main part of the submitted job once it finishes by email. Possible values are 'always', 'complete', 'error' or 'never'

  • Number allowed: One
  • Required?: No
  • Default: never
  • Line format:

Notification =

Output

This line is used to specify a name for the file to be used for stdout for the main part of the submitted job. This file will be left on the remote machine to be uploaded using a relevant Sput line if desired.

  • Number allowed: One
  • Required?: No
  • Default: N/a
  • Line format:

Output = filename

Queue

This line is used within condor to tell it to submit the job and my_condor_submit uses it for the same purpose, however it is not actually needed by my_condor_submit and will actually just be ignored if specified.

  • Number allowed: One
  • Required?: No
  • Default: N/a
  • Line format:

queue

Transfer_Error

This line is used to specify whether to return the stderr from the execution machine to the local machine (a value of `true') or leave it on the execution machine (a value of `false') to be uploaded using an appropriate Sput line.

  • Number allowed: One
  • Required?: No
  • Default: true
  • Line format:

Transfer_error = true | false

Transfer_Executable

This line is used to specify whether my_condor_submit should transfer the executable from the local machine to the execute machine rather than using the SRB. This doesn't make sense for meta-scheduled jobs. A value of `true' will transfer the file from the local machine while `false' will not

  • Number allowed: One
  • Required?: No
  • Default: false
  • Line format:

Transfer_Executable = true | false

Transfer_Input_Files

This line is used to specify a set of files that should be sent with the executable to the execution node within a condor pool. This line does not make sense when submitting to anything other than a condor jobmanager and so will be ignored by my_condor_submit in this case. The files will be transferred from the condor pool's submit node (to which my_condor_submit submits its job) to the relevant execution machine after they have been downloaded using the pre stage of the my_condor_submit job.

  • Number allowed: One
  • Required?: No
  • Default: Emtpy list (i.e. no files)
  • Line format:

Transfer_Input_Files = filename, filename, filename ...

Transfer_Output

This line would be used to specify whether to return the main job's stdout file to the submission machine. However this does not make sense within the my_condor_submit context and so is ignored and the output file is always left on the remote machine to be uploaded with a relevant Sput line.

  • Number allowed: One
  • Required?: No
  • Default: false
  • Line format:

Transfer_Output = false

Universe

This line is included to provide backward compatibility with older versions of my_condor_submit and is used to tell condor that it should use Globus to submit to the remote execution machine. The only permissible value is `Globus'

  • Number allowed: One
  • Required?: No
  • Default: globus
  • Line format:

Universe = Globus

x509_user_proxy

This line is used to specify the location of the user's x509 certificate's certificate proxy should it not be in the location specified by the X509_USER_PROXY environment variable. The value specified here will override the value retrieved from grid-proxy-info and the environment variable. This line is designed to allow my_condor_submit to be used when the user has gsissh'd into a submit machine

  • Number allowed: One
  • Required?: No
  • Default: N/a
  • Line format: * N/a
  • Line format:

x509_user_proxy =

Topic revision: r1 - 19 Jan 2009 - RobAllan
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback