-
Notifications
You must be signed in to change notification settings - Fork 0
Agave apps
- Registering an App
- Publishing an App
- Running an App from command line
To perform the following steps you need to sign up for a CyVerse account here.
See the FAQ.
Agave is a RESTful API, meaning that we can interact with it using POST
and GET
http requests. The cyverse-cli tools are essentially wrappers around these types of requests to make these requests shorter.
Agave tracks two kinds of resources: Systems and Apps. There are 2 types of Systems: Storage and Execution. Apps run on Execution Systems using data from Storage Systems to produce desired results. So to run an app, we need an Execution system first. Systems are described using JSON files, which are then posted to the API. An Execution System JSON consists of 4 parts, which will be described below.
The first part consists of system basics: id, type etc. See an example below:
Expand source
"id" : "myTutorialMachine",
"name" : "A machine for the EI Agave tutorial",
"type" : "EXECUTION",
"executionType" : "CLI",
"scheduler" : "FORK",
The variables mostly speak for themselves. The executionType
variable can be either CLI, CONDOR or HPC depending on the type of scheduler running on the system. In this case, we assume there is no scheduler running and so we choose CLI
, with FORK
as a scheduler. See the Agave docs for more details on the Scheduler variables.
All Execution systems need to define storage as scratch space. For this example, we'll assume you have a scratch directory mounted somewhere on /mnt/
(an SSHFS for example)
Expand source
"storage": {
"host" : "yourhost.example.org",
"port" : 22,
"protocol" : "SFTP",
"homedir" : "/mnt/scratch/username",
"rootdir" : "/mnt/scratch",
"auth" : {
"type": "PASSWORD",
"username" : "username",
"password" : "password"
},
}
If you are uncomfortable with putting your password in plaintext, see below for specifying an auth
object with SSHKEYS.
All execution systems need a default Queue to which jobs are submitted. In our example, we are using a simple CLI system, so there are no scheduler queues that we need to deal with. This means we can get away with a simple specification like this:
Expand source
"queues": [ {
"name": "normal",
"default": true,
"maxRequestedTime": "24:00:00",
"maxJobs": 10,
"maxUserJobs": 5,
"maxNodes": 1,
"maxMemoryPerNode": "4GB",
"maxProcessorsPerNode": 12,
"customDirectives": null
} ]
You'll want to change the variables to suit your system.
Lastly, Agave will need to know information to login to the Execution system. This can be specified using a Login object, which is specified as follows:
Expand source
"login": {
"host" : "yourhost.example.org",
"port" : "22",
"protocol": "SSH",
"auth" : {
"type" : "PASSWORD",
"username" : "username",
"password" : "changethis"
}
}
Like mentioned before, posting your password in plaintext is usually a bad idea. We can specify a login object using public and private keys as well. To do this we'll change the auth
part of the object as follows:
Expand source
"auth" {
"type" : "SSHKEYS",
"username" : "username",
"publicKey" : "ssh-rsa AAAA...your public key... username@yourhost.example.org",
"privateKey" : "-----BEGIN RSA PRIVATE KEY-----*private key here*-----END RSA PRIVATE KEY-----"
}
An important thing to note when using keypairs is that your private key should be JSON encoded before pasting it into the JSON file using the jsonpki command:
jsonpki --private /path/to/private/id_rsa
If necessary, a password for the file can be specified using --password
.
Now that we have defined our system, you can find the completed JSON file below:
Expand source
{
"id" : "myTutorialMachine",
"name" : "A machine for the TGAC Agave tutorial",
"type" : "EXECUTION",
"executionType" : "CLI",
"scheduler" : "FORK",
"storage": {
"host" : "yourhost.example.org",
"port" : 22,
"protocol" : "SFTP",
"homedir" : "/mnt/scratch/username",
"rootdir" : "/mnt/scratch",
"auth" : {
"type": "PASSWORD",
"username": "username",
"password": "changethis"
},
},
"queues": [ {
"name": "normal",
"default": true,
"maxRequestedTime": "24:00:00",
"maxJobs": 10,
"maxUserJobs": 5,
"maxNodes": 1,
"maxMemoryPerNode": "4GB",
"maxProcessorsPerNode": 12,
"customDirectives": null
} ],
"login": {
"host" : "yourhost.example.org",
"port" : "22",
"protocol": "SSH",
"auth" : {
"type" : "PASSWORD",
"username" : "username",
"password" : "changethis"
}
}
}
Let's use it to register the system on Agave:
systems-addupdate -v -F TutSystem.json
A large amount of JSON describing our new system will be returned to confirm the registration. Now that we have an execution system, let's move on to registering our workflow as an App in the next part.
An App in the Agave API means a workflow that is wrapped into a single unit which can be executed by a user. It is described in the same way a system is described: JSON. In this part we'll register a test app that runs a simple BLAST job.
The first thing we'll need to describe are some basic parameters of our app:
Expand source
"name" : "blastapp-tutorial",
"label" : "EI tutorial BLAST app",
"version" : "0.0.1",
"executionType" : "CLI",
The App ID will be generated from the name and version number and this combination must be unique. You can delete the previous one (if there was an error), or increase the version number (if you need to make an updated version).
Next, we'll specify where and how the app will run:
Expand source
"executionSystem" : "myhost.example.org",
"deploymentPath" : "username/apps/EI_tutorial",
"templatePath" : "wrapper.sh",
"testPath" : "test.sh",
"parallelism" : "SERIAL",
When specifying an executionSystem
only like above, you must make sure your app assets are already present on the system!. This means that you need admin access to your execution system. Often this is not the case. To remedy this, we can store our apps assets on the CyVerse Datastore and specify a deploymentSystem
parameter like so:
"deploymentSystem" : "data.iplantcollaborative.org",
If you are planning to publish your app with CyVerseUk we'd ask to add the "ontology"
fields with a list of EDAM URI and the "tag" : [ "CyverseUK"]
. You can easily see some complete JSON examples in this organization's repositories.
Finally, we'll specify our apps inputs:
Expand source
"inputs" : [ {
"id": "query",
"details" : {
"label": "Query" ,
"description": "FASTA file with query sequence(s)"
},
"value": { "required" : "true" }
},
{
"id": "database",
"details" : {
"label": "Database" ,
"description": "FASTA file with sequences to search (database)"
},
"value": {"required" : "true"}
}
],
"parameters" : [ ]
We're leaving parameters empty, but we could add any BLAST command line parameters here. Now that we have specified this, we'll have to actually upload our app's assets to CyVerse.
We'll upload data to the datastore using the Discovery Environment (DE), however, the CyVerse datastore uses iRods under the hood, so you could use icommands as well. For more details, see the CyVerse wiki.
First, login to the DE at [https://de.iplantcollaborative.org/]. You'll be presented with a desktop like environment. Click on the "Data" button. This will open up a file manager window, with a file tree on the left hand side. Here, click on the folder with your username (at the top). We'll create a new folder to hold our apps first. Go to "File" and select "New Folder...". Name the new folder "EI_tutorial" and click "OK" to confirm. Navigate to our newly created folder by clicking on it. This is where our app's assets will live, which we'll create in the next sections.
To develop an app on CyVerseUK system it would be a good idea to make the assets live in our systems.
For our minimal BLAST app, we'll need three files: a wrapper script, a test script and an executable. Because of the way BLAST works, we'll actually need two executables for this app. First we'll create the wrapper script:
Expand source
#!/bin/bash
QUERY="${query}"
DATABASE="${database}"
#These two lines are necessary because permissions get lost in the Agave transfer
chmod u+x lib/makeblastdb
chmod u+x lib/blastn
lib/makeblastdb -dbtype nucl -in $DATABASE -out db
lib/blastn -query $QUERY -db db
return $!;
As you can see from the first line, this is a plain bash script that runs our pipeline. The next two lines set up our main two parameters: the query and the database. The ${query}
directive will be replaced BEFORE execution of the script by Agave to the inputs we have given. Note that the word query is the id we specified in our JSON file earlier. The next line does the same for the database file.
The next two lines run our actual BLAST 'pipeline': first we create our database with makeblastdb and we then execute the BLAST with blastn. The lib/
part of the command line is because of the way we will set up our app assets; Agave convention requires that all our App's executables are stored in a separate lib directory. We output the database in the first line with a simple title of db, and we call that database again in the next line.
The last line returns the current exit status, which will be inherited from the status of BLAST; this means that the script will pass on BLAST's exit value as its own.
Next, we'll need a test script that test our app with some default data. This is useful, but we'll skip this for now as it is a bit out of this tutorials' scope. Instead, we'll just write a script that returns true and call it done:
#!/bin/bash
return true
Finally we'll need to provide the BLAST executables. These can be obtained from the NCBI ftp server.
Now that we have everything, let's get our assets setup in the datastore. Go back to your DE window, and go the the #EI_tutorial folder under your username (if you weren't already there). Create a folder called lib, and navigate to it. We'll put our BLAST executables here. Go to the "Upload" menu on the top left-hand corner of the file navigation window. The easiest way is to upload the executables from this repo directly, so choose "Import from URL...".
The wrapper script should perform all the checks that the Agave API doesn't support (mutually inclusive or exclusive parameters for example), and ideally return the proper error before running the Docker container. It may be useful to use the wrapper script to delete any new files that is not needed from the working directory, to avoid them to be archived.
In our case there is some additional logic in the wrapper scripts to allow some automatic tasks in the virtual machines to perform as expected and to integrate the system with the webapp (CyVerseUk). You don't usually have to worry about this.
Now that our assets are in place, we can register our app in Agave using the JSON file we wrote earlier. (If needed, refresh your access tokens with auth-tokens-refresh
). Navigate to where the file is stored (we'll assume you've named it TutApp.json) and run the apps-addupdate
command:
apps-addupdate -v -F TutApp.json
A bunch of JSON describing your app will be returned, confirming the registration of our app.
Following the introductory part the JSON file lists inputs and parameters. A good documentation about the available fields and their usage can be found here.
For the application (if you wish to publish it) to display a proper information window in the Discovery Environment, the following fields need to be present in the JSON file: help_URI
, datePublished
, author
, longDescription
.
In the ontology
field a list of IRI for topic and operation branches of the EDAM ontology has to be specified to properly categorize the App.
May you encounter some problems registering your application, I'd suggest first checking the JSON file is valid. A good way to do this is to copy-paste it to AgaveToGo.
If details.showArgument
(boolean) is set to true
, it will pass details.argument
before the value (e.g. if we want to pass to command line --kmer 31
). Note that the argument is put before the value without spaces (so usually we want to add one in the string!!).
value.validator
can supply a check on the format of the submitted value as a perl formatted regular expression. (pay particular attention to the escapes)
Example case: JSON value.type
doesn't provide a distinction between integers and floating point, but just number
. To check the input is an integer we may use "validator": "^\\d*$"
(or "^[0-9]+$"
to avoid the escapes). The same field also allow to accept just even/odd numbers, set a maximum value, etc.
Also note that it may be useful to define numerical variables as strings providing the right validator if we don't want to define a default value, because both the Discovery Environment and the CyVerseUk web interface will pass 0 otherwise.
We usually don't want the user to work in a folder that is not the set working directory, so if the program run by the App has a --output_directory
option (or similar) we may want to add a validator to be sure that the string doesn't start with '/', or just hide it and give a default name (e.g. output
, this will also make the wrapper script easier to write and maintain).
IMPORTANT:
"value": {
...
"visible": false,
"default": "default_value",
...
}
is NOT supported. The default value must be provided in the wrapper script if we don't want the user to be able to change it.
It's not possible to run an App in CyVerse interactively. Therefore to run multiple commands in a Docker container we need the following syntax in the wrapper.sh
script:
docker run <image_name[:tag]> /bin/bash -c "command1;command2...;".
/bin/bash
is not strictly necessary, but, depending on the base image, bash may not be the default shell: adding it to command line takes care of this problem.
IMPORTANT UPDATE: in Docker version 1.12 the SHELL
instruction was added. This allows the default shell used for the shell form of commands to be overridden (at build time too-so it may make the built a bit slower). Use it as follows:
SHELL ["/bin/bash", "-c"]
The HPC on CyVerseUk infrastructure is using HTCondor scheduler, so the wrapper.sh
is not enough to run the app, but a HTCondorSubmit.htc
script is needed as well.
The HTCondorSubmit.htc file will be in the following form:
Expand source
universe = docker
docker_image = [:tag]
executable =
should_transfer_files = YES
arguments =
transfer_input_files =
transfer_output_files =
when_to_transfer_output = ON_EXIT
request_memory = 100G
output = out.$(Process)
error = err.$(Process)
log = log.$(Process)
queue 1
This HTCondor submit has to be generated by the wrapper.sh
since we can't know in advance arguments and inputs files.
transfer_output_files
is not needed if the output is in the working directory. A good idea is to create, when possible, all the output files in a subdirectory (e.g. output
) of the working directory, so that the transfer is easier.
If transferring executables in transfer_input_files
, make sure to restore the right permissions in the wrapper script (e.g. chmod u+x <file_name>
).
It's also possible that the Docker image has to be updated giving 777 permissions to scripts because of how Condor handle Docker.
The App, after being made public with (this step has to be performed by a tenant admin, so please contact them if you have a ready-to-publish application):
apps-pems-update -v -u <username> -p ALL <app_name>-<version>
can be found both in the DE, under Apps>High-Performance Computing, and in the CyVerseUk web interface. The App interface is automatically generated based on the submitted JSON file.
Finally, we can run our App! We'll need one more (short) JSON file to run a new job:
Expand source
{
"name" : "blasttest",
"appId" : "blastapp-tutorial-0.0.1",
"archive" : "true",
"inputs": {
"query" : "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testquery.fa",
"database": "https://github.com/erikvdbergh/cyverseuk-util/raw/master/testdb.fa"
}
}
We'll save this file as RunApp.json and submit it as a job with the jobs-submit
command:
jobs-submit -v -W -F RunApp.json
The -W flag in this command tells it to keep watching the job in the current window, with can be stopped with Ctrl-C
.
After your job has completed, your outputs, logs and error messages will be in a folder that is generated automatically on your apps storage system (which is the CyVerse data store in our case, but you can modify this at run time in a JSON field). To view them on the CyVerse data store, check the "archive" folder under your username. All your job output will be in a separate subfolder under the "jobs" folder.
Alternatively you will be able to run your jobs through one of the available web interfaces.
Known problems with the DE: not all the teams are building apps the same way, this led to some functionalities not being available for Agave apps. In particular you may define an input field as accepting multiple files, but the GUI will not allow for multiple file selection. The same appears to be happen with AgaveToGo as well. In this case you will have to submit a JSON via command line or use http://cyverseuk.herokuapp.com/ (only for apps hosted on the EI system).
Known problem with AgaveToGo: it looks like some disabled apps keeps showing up in the list (they can't be used though).