Last modified: | March 19, 2025 |
---|
Below are instructions to
To submit a script via Grid Engine, run qsub followed by the script name, e.g.:
qsub myscript.sh
To submit a script that requires more than 2GB memory, run ‘qsub -l mem_free=XG’, where ‘X’ is the required memory, e.g.:
qsub -l mem_free=4G myscript.sh
Note
To determine how much memory your job requires, see memory reservation.
To monitor the job, run qstat.
When complete, Grid Engine creates 2 files in your home directory: an error file (that starts with ‘e’), and an output file (that starts with ‘o’). The file names include the job ID, e.g. 96606:
test.sh.e96606test.sh.o96606
This section describes how to run a matlab script called test.m with a 4G memory reservation.
verify that the matlab script (for example, test.m) is in your MATLABPATH. If you’re not sure, then move it to the directory $HOME/matlab/:
mkdir $HOME/matlab
mv test.m $HOME/matlab
In a new file, put the matlab command you want to run. For example, create a file in your home directory called runme.sh with the contents:
matlab -nosplash -nojvm -nodisplay -r test
from a terminal window, run qsub and reserve 4GB memory for your script:
qsub -l mem_free=4G $HOME/runme.sh
This section describes how to analyze multiple, independent datasets in parallel.
To start, write a script that uses environment variables to define data sets, for example:
#!/bin/sh## define data set locationDATADIR=”$HOME/DATA”DATASET=”SUBJECT_101”echo “my data is here: $DATADIR/$DATASET”
This script uses the $HOME environment variable, which is a short-cut to your home directory.
Save the text to a file, e.g. test.sh, and run it from the command line to verify it works:
sh $HOME/test.sh
Then use qsub to run it via Grid Engine:
qsub $HOME/test.sh
You can monitor the job status with qstat.
When complete, Grid Engine creates 2 files in $HOME: an error file (that starts with ‘e’), and an output file (that starts with ‘o’). The file names include the job ID, e.g. 96606:
test.sh.e96606test.sh.o96606
The next step is to run commands on multiple data sets. There are two methods to define data sets:
The data sets are named with a sequential, numeric component. A script is executed once for each element of the numeric array.
or
A script is executed once for each dataset defined in a directory or file.
Both are described below:
To use an array, data sets must be named with a sequential, numeric value (e.g. sub1, sub2, sub3).
To define the array, use the qsub -t option, followed by the numeric value, e.g. ‘-t 1-3’.
Your script will be run once for each number in the array. The array value may be referenced in your script as the SGE_TASK_ID variable.
The ‘test.sh’ script evaluates SUBJECT_101. But there are actually 3 data sets I want to analyze: SUBJECT_101, SUBJECT_102, and SUBJECT_103. The data sets are named with a sequential, numeric component (101-103), so I define the data sets with a qsub numeric array: -t 101-103.
Grid Engine will run test.sh script 3 times, once for each value in the array. I use the SGE_TASK_ID variable to reference the value of the array for the analysis: during the first iteration, the value of SGE_TASK_ID is ‘101’; during the second iteration, SGE_TASK_ID is ‘102’; and in the third, SGE_TASK_ID is ‘103’.
Next, I modify the test.sh script to use the SGE_TASK_ID variable to analyze data corresponding to each subject. Specifically, replace ‘SUBJECT_001’ with ‘SUBJECT_${SGE_TASK_ID}.
Below is the modified ‘test.sh’ script:
#!/bin/sh## define data set locationDATADIR=”$HOME/DATA”DATASET=”SUBJECT_${SGE_TASK_ID}”## Run the following commands once per iteration# If the output looks good, then remove the word ‘echo’ to run the recon-all command, instead of printing itdateecho recon-all -sd $DATADIR -s $DATASET
run the ‘test.sh’ script with qsub:
qsub -t 101-103 $HOME/test.sh
When complete, 6 files are created in $HOME - one output file and one error file for each iteration.
Alternatively, you may use the ‘submit’ command to define the location of your data. The ‘submit’ command has the functionality of ‘qsub’, plus it allows you to define where your data is located.
The ‘submit’ command syntax is:
submit -s /path/to/SCRIPT [ -d /path/to/DIRECTORY | -f /path/to/FILE ] [ -o OPTIONS_FILE ]
The submit command will run the script specified by the -s option once for each data set specified by the -f or the -d option. During each iteration, the name of the data set is stored with the SGE_TASK variable. (The qsub options file is specified by the -o option, and is not required).
As shown previously, the test.sh script evaluates data from SUBJECT_101. Now I want to edit the script to analyze 3 non-sequential data sets: SUBJECT_101, SUBJECT_202, and SUBJECT_301. These data sets aren’t named with a sequential numeric ID, so I use the ‘submit’ command to evaluate them.
I create a text file with a list of the data sets, called ‘test.subjects’:
SUBJECT_101SUBJECT_202SUBJECT_301
I edit ‘test.sh’ to use the SGE_TASK variable to analyze data from the test.subjects file above. Specifically, replace ‘SUBJECT_001’ with ‘${SGE_TASK}’:
#!/bin/sh## define data set locationDATADIR=”$HOME/DATA”DATASET=”${SGE_TASK}”## Run the following commands once per iteration# If the output looks good, then remove the word ‘echo’ to run the recon-all command, instead of printing itdateecho recon-all -sd $DATADIR -s $DATASET
Warning
Be careful to reference your data sets correctly. The qsub numeric array uses the SGE_TASK_ID variable, and submit command uses the SGE_TASK variable. The primary difference is that SGE_TASK_ID is a number, whereas SGE_TASK is usually the full name of your data set.
Submit the ‘test.sh’ script with qsub:
submit -s $HOME/test.sh -f $HOME/test.subjects
When complete, 6 files are created in my home directory - one output file and one error file for each subject.
This section is optional.
-M emailaddress | Change ‘emailaddress’ to specify where you want to receive notifications
The default value is the address specified in the .forward file in your home directory
|
-m b|e|a|s|n | The frequency of e-mail notifications.
The default is:
-m as
The arguments have the following meaning:
- b : Mail is sent at the beginning of the job
- e : Mail is sent at the end of the job
- a : Mail is sent when the job is aborted or rescheduled
- s : Mail is sent when the job is suspended
- n : No mail is sent
|
-e path | Change ‘path’ to the directory where Grid Engine saves error files
The default is your home directory. If you change the default, verify the directory exists
|
-o path | Change ‘path’ to the directory where Grid Engine saves output files
The default is your home directory. If you change the default, verify the directory exists
|
-N name | Change ‘name’ to the name of the job
|
-j yes | Merge the Grid Engine output and error files
|
-l mem_free=value | Change ‘value’ to the amount of memory required by your script.
e.g. -l mem_free=15G reserves 15GB of RAM for the script
|
You may specify qsub options immediately after the qsub command, e.g.:
qsub -N test /path/to/script
Alternatively, you may put options in a text file, one option per line, e.g.:
-o $HOME/sge/logs-e $HOME/sge/logs-N test
If you use an options text file, then define the options file with the ‘-@’ flag followed by the filename, e.g.:
qsub -@ /path/to/options /path/to/script