3. Data Transfer#

3.1. How can I share files with other users?#

The main platform to share files in Hydra is a so-called Virtual Organization (VO). It provides extra disk space in Hydra that is shared with the other members of your research group, where you can easily share any data files with them.

If you want to setup a shared directory with another user in Hydra, but joining a common VO is not an option, you should contact VUB-HPC Support. We will evaluate the best solution for your use case.

3.2. Can I copy files between Hydra and VUB’s OneDrive directly?#

You can copy files between Hydra and the VUB OneDrive directly, using the third-party sync app Onedrive Client for Linux. This avoids copying the files to/from your local computer as an intermediate step.

Warning

Several restrictions and limitations apply to OneDrive:

  • OneDrive does not discrimintate capitalization in file names. Avoid having two files in the same folder that only differ in the capitalization.

  • OneDrive does not allow filenames that contain any of the characters \/:*?""<>|. Files that contain any of these characters will not be synced.

  • The following names aren’t allowed for files or folders: .lock, CON, PRN, AUX, NUL, COM0 - COM9, LPT0 - LPT9, _vti_, desktop.ini, any filename starting with ~$.

  • “_vti_” cannot appear anywhere in a file name.

  • “forms” isn’t supported when the folder is at the root level for a library.

  • You can’t create a folder name in SharePoint that begins with a tilde (~).

3.2.1. Client Authorization#

  1. Run the onedrive command for the first time.

    onedrive
    

    Upon execution, a URL starting with https://login.microsoftonline.com is shown to authorize the client to access your VUB Office 365 account. The URL contains the client_id of the sync app, which should be exactly ‘d50ca740-c83f-4d1b-b616-12c519384f0c’:

    Output:#
    $ onedrive
    Configuring Global Azure AD Endpoints
    Authorize this app visiting:
    
    https://login.microsoftonline.com/[...]
    
    Enter the response uri:
    
  2. Copy/paste the full URL in your browser.

  3. Log in with your credentials if necessary. You should be redirected to a blank page in your browser.

  4. Copy/paste the URL of the blank page into the prompt of onedrive in Hydra.

At this point, if there is no error, your client should have access to your account. By default, the access token to Office 365 is stored in the file ~/.config/onedrive/refresh_token.

3.2.2. Synchronize with personal OneDrive#

  1. Create a directory that will be synced with your OneDrive.

    The following command creates the sync directory hydra-sync inside $VSC_DATA/onedrive (avoid using $HOME as it is small).

    mkdir -p $VSC_DATA/onedrive/hydra-sync
    
  2. Create the configuration file ~/.config/onedrive/config.

    The following commands generate the config file. The entry sync_dir is mandatory and points to the parent directory of the sync directory. Also, we recommend to skip syncing symlinks and dotfiles (files that start with a dot) by default to avoid unexpected data transfers unless you know that you need those.

    config=~/.config/onedrive/config
    echo sync_dir = \"$VSC_DATA/onedrive\" > $config
    echo 'skip_symlinks = "true"' >> $config
    echo 'skip_dotfiles = "true"' >> $config
    
  3. Create the sync_list file ~/.config/onedrive/sync_list.

    The following command adds the sync directory hydra-sync to the sync_list file. This ensures that only data inside the sync directory is synced.

    echo hydra-sync > ~/.config/onedrive/sync_list
    
  4. Check if the OneDrive client has been configured correctly.

    onedrive --resync --synchronize --verbose --dry-run
    
  5. If the dry-run succeeded, re-run the above command but remove the --dry-run option to do the real sync.

    onedrive --resync --synchronize --verbose
    

    If the sync is successful, the sync directory (here: hydra-sync) should show up under My files in your VUB OneDrive.

  6. For subsequent synchronizations, remove also the --resync option to avoid any further full synchronization. A resync is only needed after modifying the configuration or sync_list file.

    onedrive --synchronize --verbose
    

3.3. Can I copy files between Hydra and Nextcloud/ownCloud services directly?#

You can indeed copy files directly between Hydra and cloud services that support the WebDAV protocol, such as Nextcloud/ownCloud. This avoids copying the files to/from your local computer as an intermediate step. The davix- tools are installed by default in the login nodes.

Example Using davix to copy files between Hydra and your cloud service:

Copy the file myfile.txt from your cloud home directory to Hydra (replace <cloud_url> with the webdav url of the cloud service)#
davix-get <cloud_url>/myfile.txt myfile.txt --userlogin <login> --userpass <passwd>
Copy the file myfile.txt from Hydra to your cloud home directory#
davix-put myfile.txt <cloud_url>/myfile.txt --userlogin <login> --userpass <passwd>
Copy directory mydir recursively from Hydra to your cloud home directory (using 4 concurrent threads with -r for increased speed)#
davix-put mydir <cloud_url>/mydir --userlogin <login> --userpass <passwd> -r 4

See also

The davix documentation.

3.4. How can I transfer data to/from Hydra with Globus?#

Hydra is already available in Globus with its own collection. The name of Hydra’s collection is VSC VUB Tier2. Please follow the steps below to add Hydra to your Globus account:

  1. Install and configure Globus Personal Connect in your local computer following VSC Docs: Globus

  2. Open Globus and select the File Manager in the left panel

  3. Write VSC VUB Tier2 in the Collections field and select it

  4. At this point, the storage of Hydra will open and you can navigate it within Globus. Only data in your $VSC_DATA and $VSC_SCRATCH will be accessible

    • Path to your VSC_SCRATCH: /~/scratch/brussel/<vsc_first_3_digits>/<vsc_username>/

    • Path to your VSC_DATA: /~/data/brussel/<vsc_first_3_digits>/<vsc_username>/

Tip

Create bookmarks in Globus to easily access your data in Hydra

3.5. How can I automate the transfer of data to/from Hydra?#

Automatic (scripted) data transfer between Hydra and external SSH servers can be safely done using rsync in Hydra with a secure SSH connection without password. The authentication with the external server is done with a specific pair of keys not requiring any additional password or passphrase from the user. Once the passwordless SSH connection between Hydra and the external server is configured, rsync can use it to transfer data between them.

Important

The only caveat of this method is that anybody gaining access to your Hydra account will automatically gain access to your account in the external server as well. Therefore, it is very important that you use a user account in the external server that is exclusively used for sending/receiving files to/from Hydra and that has limited user rights.

The following steps show the easiest way to setup a secure connection without password to an external server:

  1. Check the connection to the external server from Hydra: Login to Hydra and try to connect to the external server with a regular SSH connection using a password. If this step does not work, your server may not be reachable from Hydra and you should contact the administrators of the external server to make it accessible:

    $ ssh <username>@<hostname.external.server>
    
  2. Create a SSH key pair without passphrase: Login to Hydra and create a new pair of SSH keys that will be exclusively used for data transfers with external servers. The new keys have to be stored inside the .ssh folder in your home directory. In the example below, the new key is called id_filetransfer. Leave the passphrase field empty to avoid any password prompt on authentication:

    $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/your/home/.ssh/id_rsa): </your/home/.ssh/id_filetransfer>
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in id_filetransfer.
    Your public key has been saved in id_filetransfer.pub.
    [...]
    
  3. Transfer the keys to the external server: The new key created in Hydra without a passphrase has to be installed in the external server as well. In this step you will have to provide your password to connect to the external server:

    $ ssh-copy-id -i ~/.ssh/id_filetransfer <username>@<hostname.external.server>
    
  4. Configure the connection to the external server: The specific keys used in the connection with the external server can be defined in the file ~/.ssh/config. This avoids having to explicitly set the option -i ~/.ssh/id_filetransfer on every SSH connection. Add the following lines at the bottom of your ~/.ssh/config file in Hydra (create the file if it does not exist):

    1Host <hostname.external.server>
    2    User <username>
    3    IdentityFile ~/.ssh/id_filetransfer
    
  5. Check the passwordless connection: At this point it should be possible to connect from Hydra to the external server with the new keys and without any prompt for a password:

    $ ssh <username>@<hostname.external.server>
    
  6. Automatic copy of files: Once the passwordless SSH connection is properly configured, rsync will automatically use it. You can execute the following commands in Hydra to either transfer data to the external server or from the external server:

    Transfer from Hydra to external server#
    $ rsync -av /path/to/source <username>@<hostname.external.server>:/path/to/destination
    
    Transfer from external server to Hydra#
    $ rsync -av <username>@<hostname.external.server>:/path/to/source /path/to/destination