Skip to content

Data management through distributed storage for scientific data (dCache)

Data handling

Nordugrid ARC middleware supports various access protocols such as ftp, gsiftp, http, https, httpg, dav, davs, ldap, srm, root, rucio, and s3. It uses cache for input data and can optimize transfers, using only one transfer for jobs using same dataset, and more.

Storing data on remote dCache

  • It is available for SLING users, members of gen.vo.sling.si and other VOs.
  • Suitable for jobs standard input and output data as a temporary storage (ARC), with limited quota.
  • The default setup is not appropriate for confidential unencrypted data, members of same VO can read data of other members.
  • Short and long term storage on dCache server and pools within SLING.
  • No backup!
  • More details of HPC Vega data storage solutions at link

ARC client provides commands for direct handling of data, documentation and useful commands are found here.

S3 Object Storage

HPC Vega is offering object storage. To obtain credentials, OpensSack client is needed. For data management any S3 client should work, i.e. s5cmd, libs3, or boto3. Users of Vega HPC can use client on login nodes. Initial user quota is set to 100GB.

Obtaining key and secret for accessing project in S3 object storage:

openstack --os-auth-url https://keystone.sling.si:5000/v3 --os-project-domain-name sling --os-user-domain-name sling --os-project-name <project_name> --os-username <user_name> ec2 credentials create

Environment variables:

OS_AUTH_URL=https://keystone.sling.si:5000/v3
OS_PROJECT_NAME=<project_name>
OS_PROJECT_DOMAIN_NAME=sling
OS_USER_DOMAIN_NAME=sling
OS_IDENTITY_API_VERSION=3
OS_URL=https://keystone.sling.si:5000/v3
OS_USERNAME=<user_name>

Command for obtaining key and secret:

openstack ec2 credentials create

s5cmd Client

Data transfer with client. Environment variables such as aws_access_key_id and secret_access_key are stored within ~/.aws/credentials file.

mkdir ~/.aws
chmod 700 ~/.aws
touch ~/.aws/credentials
chmod 600 ~/.aws/credentials
cat >~/.aws/credentials <<EOF
[default]
aws_access_key_id = <access>
aws_secret_access_key = <secret>
EOF

List the content:

s5cmd --endpoint-url https://ceph-s3.vega.izum.si ls

Create bucket:

s5cmd mb mybucket01

Check if bucket is created:

s5cmd head s3://mybucket01/

Copy file into bucket:

s5cmd --endpoint-url https://ceph-s3.vega.izum.si cp <data> s3://mybucket01/

Download file(s) from bucket:

s5cmd cp s3://mybucket01/data01.tar.gz .

Remove file(s) from bucket:

s5cmd rm s3://mybucket01/data01.tar.gz

More commands at s5cmd.