Last modified:July 29, 2022

Data management

The following topics relate to disk access.

data backups

backups are available on Neuro cluster nodes nx10 through nx28. These nodes aren’t accessible from the internet, so you must login to a x2go login server first. For example, to log into nx10, open a terminal window in your x2go session, and type:

ssh nx10

Your lab’s backups are in the directory /home/lab-backup (replace ‘lab’ with your lab’s name).

The following table displays the name and age of backups (replace ‘lab’ with your lab’s name):

Directory Backup established
/home/lab-backup/lab.1 less than 24 hours ago
/home/lab-backup/lab.2 1-2 days ago
/home/lab-backup/lab.3 2-3 days ago
/home/lab-backup/lab.4 3-4 days ago
/home/lab-backup/lab.5 4-5 days ago
/home/lab-backup/lab.6 5-6 days ago
/home/lab-backup/lab.7 6-7 days ago

When you’re finished with the backup, please log out so data will be refreshed.

user backups

I recommend that you backup important files to off-site storage. Berkeley Box is free to staff, faculty and students. It provides unlimited off-site storage, with an option to sync files. For more information, see:

disk quotas

Run the ‘quota’ command from the command line to view your disk quota and usage:

quota

Run ‘dsum’ to summarize your disk usage:

dsum

For more options, run:

dsum --help

It may take a while to complete, depending on your disk usage.

Review the output, and

  1. delete temporary data
  2. delete redundant data
  3. compress data that you no longer use

If you exceed disk quota, you will receive email with instructions every day during the 10-day grace period.

If the grace period expires, you may log in via secure shell if you have a SSH key installed. Remote desktop (X2Go) logins will fail.

If you want a quota increase, email support@cirl.berkeley.edu before the grace period expires. Please don’t wait until the last day of your grace period to request a quota increase.

data compression

I recommend compressing:

  1. files that you can analyze in a compressed format (e.g. afni BRIK)
  2. files you want to archive (files you don’t expect to use on the cluster)
Do not compress/uncompress files frequently to stay under quota. This causes lag, and reduces the efficiency of the backups.

Below are examples using ‘gzip’ from the command line:
  • Compress a file

    This example will compress a file named ‘unused.txt’, and rename it ‘unused.txt.gz’. Modify ‘unused.txt’ to specify the name of the file you want to compress:

    gzip unused.txt
  • Uncompress a file

    This example will uncompress a file named ‘unused.txt.gz’, and rename it ‘unused.txt’. Modify ‘unused.txt.gz’ to specify the name of the file you want to uncompress:

    gunzip unused.txt.gz
  • Compress files in a directory

    This example will compress all files in /location/of/files (and subdirectories). Modify ‘/location/of/files/’ to specify the directory containing the files you want to compress:

    gzip -r /location/of/files
    
  • Useful in shared directories - compress files you own in a directory

    This example will compress files you own in /location/of/files (and its subdirectories). Modify ‘/location/of/files/’ to specify the directory containing the files you want to compress. Don’t modify the second line that begins with ‘find’ - it should be run as written:

    cd /location/of/files
    find . -user $USER -type f ! -name "*gz" -exec gzip {} \;
  • Compress files you own based on the filename

    This example will compress files you own in /location/of/files that end with the string ‘.dcm’. Modify ‘/location/of/files/’ to specify the directory containing the files you want to compress. Modify ‘.dcm’ to specify the string in the filename that you want to match:

    cd /location/of/files
    find . -user $USER -name "*.dcm" -exec gzip {} \;
  • Compress big files you own

    This example will compress big files you own. It runs at low priority, so it won’t cause network lag. Modify ‘/location/of/files/’ to specify the directory containing the files you want to compress. Don’t modify the second line that begins with ‘find’ - it should be run as written:

    cd /location/of/files
    ionice -c2 -n7 find . -user $USER -size +500k ! \( -iname "*.zip" -o -iname "*.gz" -o -iname "*.tgz" -o -iname "*.bz2" \) -type f -exec gzip {} \;
  • Uncompress files based on the filename (recursively)

    This example will uncompress files you own in /location/of/dicoms that end with the string ‘.dcm.gz’. Modify ‘/location/of/dicoms/’ to specify the directory containing the files you want to uncompress. Modify ‘.dcm.gz’ to specify the string in the filename that you want to match:

    cd /location/of/dicoms
    find . -user $USER -name "*.dcm.gz" -exec gunzip {} \;