The RCommands framework provides a set of scriptable commands to associated metadata to files stored within a distributed files system such as the Storage Resource Broker, a set of FTP servers, or a set of files available over http. They enable the creation or metadata to be semi-automated. The RCommands insert and modify metadata held within a central metadata server.
The RCommands have been developed by Rik Tyer (http://www.e-science.clrc.ac.uk/web/staff/richard_tyer) of the CCLRC eScience Centre (http://www.e-science.clrc.ac.uk/web/), Daresbury Laboratory UK (http://www.cclrc.ac.uk/Activity/DL).
Data organisation
The RCommands assume a three-layer hierarchy for the data
The study level. This is the over-arching level under which you will group all files concerned with one particular piece of work. Examples might be a study of sea surface temperatures in the North Atlantic Ocean. If you use the pdf publications files as your data, all together they might represent a single study called "escience".
The dataset level. This grouping will consist of a set of files associated with one aspect of the study. For example, in a study of sea surface temperatures, it might be one season or one region. If you use the pdf publications files above, we have already separated these into possible data sets ("grid computing", "data management", "collaborative tools" and "applications").
The data object level. This will consist of a single file or a natural collection of files (such as the complete set of files produced by a single computation). If you use the pdf publications file, each file will be a data object.
One important point should be noted: the study and dataset levels are completely abstract. In contrast, the data objects correspond to URIs that point to real objects, including (but not exclusively so) files or collections of files in the SRB.
Users should not feel constrained by this hierarchy. For example, you may feel that your whole life's work is one study, so that this level has little meaning. On the other hand, you may feel that any one study should only have data objects. This hierarchy has many interpretations and should be used in the way that best suits the investigator.
It is possible to add metadata to each of these levels. Within the framework of the RCommands, each level will have an ID number that is used in the scriptable RCommands.
The commands
There are only ten RCommands, with detailed descriptions provided below.
Rinit: starts an RCommand session, and is needed in order to read information from configuration files.
Rpasswd: changes your password that is associated with your access to the metadata server.
Rcreate: creates a metadata object, ie any of the study, dataset and data object levels of metadata.
Rannotate: adds a decription or a metadata parameter name/value pair to a study of dataset
Rls: lists the different entities within the metadata database.
Rget: displays the metadata associated with a particular entity.
Rrm: removes entities from the metadata database.
Rchmod: adds or removes investigators to or from a study.
Rsearch: searches the metadata associated with studies and datasets for name/value pairs or keyword descriptions
Rexit: ends an RCommand session and has the primary effect of cleaning away hidden files created during the session.
Usage
Username
You will need a username to provide you with access to the RCommands database: this will be provided by the database manager.
Create the configuration files
You need to create a file of the name ~/.rcommands/rcommand.config, which has the form
username = password = cacertdir = /etc/grid-security/certificates
Initiating an RCommand session
You initiate an RCommand session using the Rinit command. You can test that all is well by typing the Rls command: it will return a message telling you about any studies you have. To get information about other commands, you can simply type the command name with no arguments, you can use the unix man command, or you can look at the information below.
Creating a study
First use the Rcreate command to create a study level. To use Rcreate you will need to give the study a name, add a description, and assign it to a topic, via:
Rcreate -n -k -t
First you should think about the topic. You can list all topics by the command
Rls -t
Chose a topic and note the number; this will be the topicID label. Run the Rcreate command to create a study. The name and description labels can contain more than one word within quotes. For example, suppose we want to create a database containing a set of workshop papers, we might set this up by:
Rcreate -n "Workshop papers" -k "Papers for workshop" -t 4
We can check that this has worked by running the Rls command. This will return information like
StudyID: 1026 Name: Workshop papers
where the StudyID number will differ for different people. Now we can look at this in more detail using the Rget command:
Rget -s studyID
where you add your StudyID number. For the example above:
Rget -s 1026
gives
StudyID: 1026 Name: Workshop papers Description: Papers for workshop Created by: martin dove Status: In Progress Start_date: 07-01-2006
Adding datasets with metadata
Now we want to add some data sets to the study. Following the example of pdf publications, we could create some datasets by
Rcreate -s 1026 -n "Papers on grid computing" Rcreate -s 1026 -n "Papers on data management" Rcreate -s 1026 -n "Papers on collaborative tools" Rcreate -s 1026 -n "Papers on escience applications"
Each invocate will create a DatasetID, as will be echoed to the screen. Now check on the results of these commands by
Rls -s 1026
This will show you the DatasetID for each dataset (again, different users will get different numbers). You can look at any one dataset by using the command
Rget -d DatasetID
where you use the appropriate number of each DatasetID.
Now we will add some metadata against each data set. For this we use the Rannotate command. The first is to add a brief description to the dataset. In my example, running Rls - s 1026 gives
Dataset ID: 26 Dataset Name: Papers on grid computing Parent StudyID: 1026
Dataset ID: 27 Dataset Name: Papers on data management Parent StudyID: 1026
Dataset ID: 28 Dataset Name: Papers on collaborative tools Parent StudyID: 1026
Dataset ID: 29 Dataset Name: Papers on escience applications Parent StudyID: 1026
We can use the Rannotate command in in two ways. First we can add a description to the dataset. My example is
Rannotate -d 29 -k "Collection of papers on escience applications"
Second we can add some name pairs. My example is
Rannotate -d 29 -p topic=escience Rannotate -d 29 -p topicarea=applications
Running the Rget -d 29 command to view the metadata gives
DatasetID: 29 Name: Papers on escience applications Parent StudyID: 1026 Created by: martin dove Creation_date: 07-01-2006 Description: Collection of papers on escience applications
Note that this shows the description but not the name pair values. To see the name pairs I need to use the command Rget -d 29 -p, which yields:
Parameter Name: topic Parameter Value: escience
Parameter Name: topicarea Parameter Value: applications
You can repeat this for other datasets, and you can be add whatever name/value pairs you like.
Adding data objects with metadata
Finally we reach the point where we can add metadata to the data objects. You need to first have data somewhere, and in our case our data are in the SRB. The data object can either be a file or a collection of files within the SRB. The command for adding metadata to a data object is
Rcreate -u -d -n
The specifies where the file is and has the form
srb:////