Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As the name implies, ROSS is an object store.  An object store really does not have a directory tree structure like conventional POSIX file systems do.  (Grace's storage systems are POSIX file systems.)  Instead, objects have names and live in buckets, essentially a one-level directory hierarchy.  People often name objects using what looks like a POSIX-style file path.  For example "directory1/directory2/directory3/directory4/filename" might be an object name.  The RESTful APIs used to access ROSS can do a prefix search, listing just the objects that start with a particular prefix. For example "directory1/directory2/" would pull back all the objects that start with that prefix.  So in a way one can mimic a POSIX file system employing an object naming convention that uses the equivalent of POSIX path names.  Unfortunately, prefix search is quite inefficient, making the simulated directory lookup slower than a native POSIX file system.  Nevertheless, some tools exist that can mimic a POSIX-style file system using an object store behind the scenes.  With a proper naming convention, it is fairly easy to backup and restore POSIX directory trees to and from an object store fairly efficiently.  This is what makes ROSS an excellent backing storage system for Grace.  (A backing storage system is the larger, but slower storage system that back ends a smaller, but faster storage cache in a hierarchical storage model.  Grace's storage system should be considered as cache (fast temp storage).)

Remember, the HPC Admins do not back up Grace's shared storage system, since it is only intended to be a staging or scratch area (aka a cache) used to run HPC jobs.  In other words, the primary copy of data should be elsewhere, not on Grace.  ROSS is one option for keeping the primary copy of data.  In fact, the main reason UAMS purchased ROSS is as the primary location for storing research data.

Data in ROSS starts as triple replicated on 3 different storage nodes, but then transforms to erasure coding giving resilience without major costs.  ROSS currently uses 12/16 error coding, meaning that for every 12 blocks of data stored it actually writes 16 blocks.  Up to four blocks could be damaged before ROSS is no longer able to recover data if a fifth block were damaged.  In contrast, most RAID systems only have a 1 or 2 drive redundancy.  ROSS scatters the 16 blocks across the storage nodes in a data center, improving the performance of data retrievals and further improving the resilience of the system (less chance that a node failure blocks access).  Data can be accessed from any of the storage nodes.  Currently the two systems (UAMS and UARK) are isolated from each other, not allowing replication.  But the plans to join the two are in progress.  Eventually all of the content in ROSS, regardless of which campus it physically lives on, would be accessible from either campus.  Of course, if data living on one campus is accessed from the other campus, the access will be somewhat slower due to networking delays and bandwidth limitations going between campuscampuses.

As mentioned, soon ROSS will have an option for replicating data in Fayetteville.  However, even in this case I would not consider the copy in Fayetteville as a true backup copy.  Replication is good for maintaining data that needs high availability and equivalent performance regardless of which campus the data is accessed from.  Replication also doubles the storage cost since it reduces consumes available storage capacity at twice the rate that non-replicated storage does.  We still recommended that researchers keep backup or archive copies of data somewhere else, even if replication is turned on.

For data that meets OURRstore requirements (relatively static, STEM related, not clinical nor regulated), archiving to OURRstore is a very cost effective option to get resilient, long-term, offsite copies for little extra money (just media and shipping costs).  Other options for backup include the UAMS campus Research NAS or cloud storage such as One Drive, Box, Azure, Google, Amazon, or IBM.  USB or bare drives are also an option for backup, but not recommended, as they are quite error prone if not stored and managed properly.  Some departments might have departmental storage systems that could hold backup copies of data, though the caveat of proper management of those devices still applies.

...

Keep in mind that any campuses that are participants in ARCC, and by extension, the Arkansas Research Platform (ARP), have access to ROSS.  Unlike the UAMS Research NAS, which is locked down behind UAMS firewalls hence only accessible inside the UAMS Campus, ROSS is located in the ARCC Science DMZ, a private network only accessible by a limited number of campuses, both within Arkansas and potentially beyond.  As such, it is inappropriate to store unecrypted in ROSS fully identified patient (PHI) or personal (PII) data, as it could be a violation of UAMS HIPAA or FERPA policies.  ROSS does have the ability to restrict access to buckets and to do server-side, data-at-rest encryption, but these capabilities have not been evaluated as to whether or not they are sufficient for HIPAA or FERPA compliance.  For now, ROSS should not be used for data that is regulated by HIPAA, FERPA, or any other governmental regulation.  De-identified human subject data is allowed.

...

The server-side encryption can either be turned on at the namespace level, where all buckets in a namespace are required to be encrypted, or on the bucket level for namespaces that do not have 'encryption required' turned on.  If you want namespace level encryption, please inform the HPC admins when requesting a namespace.  The encryption choice must be made at when the namespace or bucket creation time, created and cannot be changed afterwards.  (One can copy data from an unencrypted bucket to an encrypted one, then destroy the unencrypted bucket, or visa versa, should a change be needed after bucket creation.)

...

If you are are good with the data restrictions and prepared to cover costs that might be incurred for using ROSS, there are steps that must be  be completed before you can actually move data between ROSS and Grace.  First, you must request or be assigned to a namespace.  Second, decide which APIs you might use. Third, your namespace administrator using the ECS Portal may wish to pre-create the buckets that you might use.  Creating buckets in the ECS Portal can be a more convenient than creating them using APIs or tools.  Finally,  get get access credentials for the APIs that you might use.

...

Requesting a namespace assignment on ROSS

All users of ARP facilities, including Grace, may ask for credentials in the "arp" namespace.  When the administrators create your credentials, they will also create a bucket in that namespace for your use.  The bucket will have the same name as your home directory on Grace or Pinnacle, which should be the same as your username.  After creating your account, the administrators will place your S3 secret key into the file named ".ross/s3key" in your home directory.  Your S3 ID is your ROSS object user name in the arp namespace, which is "<username>@arp", where <username> is your login name for Grace or Pinnacle.

If your lab, department, project, or group would like to purchase their own namespace for their exclusive use, please contact us at HPCAdmin@uams.edu.  All namespaces have namespace administrators who manage buckets and object users within that namespace.  A particular username@domain can only be a namespace administrator for one namespace.  If a particular person needs to be the namespace administrator namespaces have namespace administrators who manage buckets and object users within that namespace.  A particular username@domain can only be a namespace administrator for one namespace.  If a particular person needs to be the namespace administrator in more than one namespace, for example a personal namespace as well as a group namespacenamespaces for two different groups, they must use different login names and domainsplus domains for each namespace they wish to administer.  This is why we suggest using your HPC credentials for personal namespaces, and other credentials (e.g. UAMS AD) for group namespacesalways qualifying namespace users (object or administrative) with the "@<namespace>" suffix, where <namespace> is the name of the namespace that the user is assigned to.

Don't forget that a namespace administrator is not the same as an object user.  See Terminology Used By the Research Object Store System (ROSS) for the difference between a namespace administrator and an object user.  An object user has a different API-specific set of credentials.

...

To initiate access to ROSS for a new personal or group (e.g. project, lab, department) namespace, please send a request via e-mail to to HPCAdmin@uams.edu.  In your request,

  • Please indicate whether this is for a personal namespace (e.g. primary storage for processing data on Grace), or for a group (shared storage). 
    • For a personal namespace, please indicate
      • your name
      • your e-mail
      • your departmental affiliation
      • what your login name (not your password) and domain you will use to access ROSS's administrative interface, e.g. johndoe@hpc.uams.edu (for personal namespaces we prefer that you use your HPC username)
      • why you do not wish to be part of the "arp" namespace (where most personal accounts go)
    • For a group namespace, please give
      • a name and brief description for the group
      • the primary e-mail contact for the group
      • the departmental affiliation of the group
      • who will be the namespace administrators - we need
        • their names
        • their e-mail address
        • their login name (not their password) and domain e.g. janedoe@ad.uams.edu (for group namespaces we generally prefer campus Active Directory usernames (e.g. the name the namespace administrator might use to login to Outlook Mail))
        • you may ask for more than one namespace administrator
        • if all the members of a particular campus AD group should be namespace administrators, you could also just give us the name and domain of the group instead of their individual names
  • Please estimate approximately how much storage you or your group intend to use in ROSS for the requested namepace, divided into local and replicated amounts.  Remember that replicated data costs twice as much as non-replicated data.  The HPC Admins will use this information in setting the initial quotas and for capacity planning.  You will be allowed to request increases in quota if needed and space is available. 
  • We would also appreciate a brief description of what you will be using the storage for.  The "what it is used for" assists us in drumming up support (and possibly dollars) for expanding the system. 

...

If you wish to access an existing group namespace as an object user, please contact the namespace administrator of that namespace and ask to be added as an object user for that namespace.  For the "arp" namespace, please contact "HPCAdmin@uams.edu".  If you need assistance determining what namespaces are available and who are the namespace administrators feel free to contact the HPC Admins via e-mail (hpcadmin@uams.edu).

...