Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In experiments we have notices that write times to ROSS are considerably slower than read times, and slower than many POSIX file systems.  However, read times are significantly faster.  In other words, it takes longer to store new data into ROSS than to pull old out of ROSS.  We also notice (as is typically of most file systems) that transfers of large objects can go significantly faster than tiny objects.  Please keep these facts in mind when planning your use of ROSS - ROSS favors reads over writes and big things over little things.

After evaluating several tools, the HPC admins settled on 2 tools, ecs-sync and rclone, as 'best of breed' for moving data between ROSS and Grace's cluster storage system, where the /home, /scratch, and /storage directory live.  The ecs-sync program is the more efficient and the fastest of the two for bulk data moves.  It consumes fewer compute resources and less memory than rclone.  Yet when properly tweaked for number of threads (i.e. when the sweet spot is found) it moves data significantly faster than rclone.  The rclone program has more features than ecs-sync, including ways to browse data in ROSS, to mount a ROSS bucket as if it were a POSIX file system, and to synchronize content using a familiar rsync-like command syntax.  While ecs-sync is great for fast, bulk moves, rclone works very well for nuanced access to ROSS and small transfers.

ecs-sync

The ecs-sync program is specifically designed for moving data from one storage technology to another.  It comes from the Dell/EMC support labs, and is what EMC support engineers use for migrating data.  Due to security issues, we do not offer the job-oriented service mode described in the ecs-sync documentation.  (If the ecs-sync maintainers ever fixed the security holes, we could reconsider.)  Instead we only support running ecs-sync in what its documentation calls "Alternate (legacy) CLI execution", where a user runs ecs-sync as a command, instead of queuing up a job.  A command alias exists on Grace's login and compute nodes for running ecs-sync.  The alias actually runs the command on a data transfer node, so does not bog down the node on which the command is issued.  In other words, feel free to use ecs-sync from the login node, from the command prompt in Grace's Open OnDemand portal, or even from a compute job, if needed, though is it somewhat wasteful of resources to run ecs-sync from a compute job. 

The syntax for calling ecs-sync interactively on Grace is:

Code Block
languagebash
titleInteractive ecs-sync command
ecs-sync --xml-config <config-file>.xml

The <config-file>.xml are the set of instructions of what is to move where in how many threads using which storage nodes.  We will describe its content later.

In running interactively, you see all the messages coming back from ecs-sync as it does its job, giving instant feedback.  But if the shell dies, the command may stop.  To avoid this, you can use nohup to run ecs-sync as a background job that continues when you log out, for example using the following syntax:

Code Block
titleRunning ecs-sync in the background
nohup ecs-sync --xml-config <config-file>.xml > <log-file>.log &