Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents

Overview

The OURRstore system, an NSF-funded storage archiveproject, is a very cost-effective way to archive data that needs to be reliably kept.  OURRstore uses LTO media, which has a bit error rate 10 times lower than spinning disk, with an expected lifetime of at least 8 years.  (The tapes theoretically can last longer if properly stored, but the hardware that reads the tapes might not.)  OURRstore increases the reliability and safety of storage by creating redundant copies of the data.  OURRstore procedures creates at least 2 copies of archived data, though we strongly recommend using the 3 copies option.  One copy stays in the OURRstore robot, and is the 'online' copy, retrievable at any time.  A second copy gets shipped back to UAMS, where we store it in an environmentally controlled, physically locked location, giving an offsite backup of our data stored in OURRstore.  In a pinch, we can recover data from the copy sent to us.  The optional (but highly recommended) third copy is taken out of the robot and stored in an environmentally an environmentally controlled storage facility at the University of Oklahoma Health Sciences Campus.  This third copy allows them to generate replacements for the primary copy should that be necessary without the risk of shipping the backup copy kept at UAMS to Oklahoma.

OURRstore is the least expensive, most reliable archival storage option available to UAMS basic science researchers.

Costs to use OURRstore

The cost to use OURRstore is only the media costs (extremely low compared to other options) plus a small additional amount for shipping cartridges to and from Oklahoma.  All the equipment and management costs are covered by NSF and OU.  All a researcher has to do is make certain that their research project, their lab, their department, or someone has provided cartridges and return shipping materials to the OURRstore team.  The HPC Admins will hassist researchers, labs, and departments in ordering the cartridges, to make certain that they get credited appropriately.

The last batch of 100 cartridges that we sent to OURRstore cost $67 each, and cost approximately $135 to ship 100 cartridges via Fedex Ground, fully insured, to Oklahoma.  USPS Priority Prepaid Forever small boxes to return individual cartridges (the second offsite copy) cost $8.45 currently.  For the recommended 3 copy option  this worked out to a one-time charge of $213.50 for approximately 7.6 TB usable triple redundant storage that should keep data safe for at least 8 years ($28.09 per TB).  The less resilient (higher risk) 2 copy option would be a $145.15 one time charge for the same 7.6 TB, minimum 8 year storage ($19.10 per TB).  These prices are expected to drop over time as we qualify less expensive vendors.  We are also exploring other shipping options.  Due to UAMS procurement policies the cartridges first have to be received at UAMS, then shipped to OURRstore.  Then after filling them with data redundantly, OURRstore ships 1 out of every 2-3 cartridges back to UAMS.  Hence we pay 3 to 4 increments of shipping for every 2 to 3 cartridges, depending on the storage option chosen.  The above numbers reflect all these estimated shipping charges.  

...

More importantly, with OURRstore you purchase tape cartridges as a one-time, up-front payment good for OURRstore's lifetime, instead of paying monthly recurring charges as you would with most cloud storage providers.  So when a particular grant expires and the money ends, your files are still safe and sound.  With a monthly storage service charge, cloud storage service services will delete your files if you run out of funds to pay for the storage.  

...

  • Researchers should include the archiving costs in their grant budgets.  Since most grants now require archiving and data sharing plans, this is a reasonable expense to include in a grant budget.  Of course, ongoing grants might not have budgeted for archiving costs, so would need to look for other options.  Pilot research projects also might not have a budget for archiving costs.  Including archiving costs in grant budgets is the preferred funding mechanism, as it does not require UAMS funds.
  • Research labs could include costs for archiving in their annual lab budgets.
  • Departments might be willing to cover the costs for archiving data as part of their departmental budgets, for example as infrastructure costs (i.e. part of indirects).

Due to the nature of how OURRstore works, the data you archive could end up on any cartridge, or even be spread across multiple cartridges.  Therefore, when you purchase cartridges, you are actually just purchasing an allotment of archival storage storage that is added to your quota, commensurate with the usable capacity of the cartridges, divided by 2 or 3, depending on which storage option that you choose (double or triple redundant).   

Please coordinate with us on the shipping address and the specific model of LTO cartridges allowed.  In fact, we are willing to put together the order for you, which might simplify things for you.  OURRstore will not accept unapproved cartridges, and may not accept cartridges that are not shipped with an appropriately addressed shipping label.  In addition, one has to provide to OURRStore materials for return shipping, including pre-paid shipping labels and arranging for pickup.  Everything has to be exactly right in order to add cartridges to the OURRStore system.  Currently only the following specific models of cartridges are allowed:

  • IBM 38L7302 [LTO-7, formatted as Type M, 9 TB raw, ~7.6 TB usable]
  • IBM 01PL041 [LTO-8, 12 TB raw, ~10.2 TB usable]

...

We cannot guarantee that your data is stored on the cartridges that you purchase.   Instead, the usable storage capacity of the cartridges would be added to your quota for you to draw from, and your cartridges would be put into the pool of cartridges that we use for everybody.  If you need tighter controls (i.e. your data and nothing else goes on the cartridges that you purchase), then you would need to used the "complicated" option for archiving data to OURRstore mentioned below (i.e., you would need to get your own OURRstore account and manage your own archiving, shipping and storage instead of using the pooled services that the HPC admins provide).

Restrictions on data that can be archived on OURRstore

Being an NSF project, there are certain stipulations on the kind of data that can be placed in OURRstore.  OURRstore is intended for NON-CLINICAL STEM RESEARCH DATA that is NOT LEGALLY REGULATED.  Non-STEM data is currently FORBIDDEN on OURRstore, because OURRstore was funded by the NSF, and non-STEM data is outside the NSF’s mandate. 

...

Warning
If you decide to use OURRstore for archiving your data, you must insure that your data complies with the above rules.


How to request storage on OURRstore

If you have a means for covering the costs and agree to the data restrictions of OURRstore, please send an a request via e-mail to hpcadmin@uams.edu confirming that you agree to the terms, and requesting access.  We will then work with you in archiving your data.

How data is stored on OURRstore

Data stored on OURRstore should be collected into compressed archive files, preferably between 20 and 200 GB in length, for the best storage efficiency without excessive access times.  Currently, the absolute minimum size of an archived file is 1 GB.  The absolute maximum size is 1 TB.  These archive files need to be created at UAMS prior to electronic transfer to the OURRstore system.  The initial transfer is disk to disk, hence goes pretty quickly.  Once the data is in the OURRstore disk cache, the OURSStore archive management software will start copying the data onto a media cartridge for safekeeping.  When a cartridge is full, the system makes a copy of the cartridge, ejects it from the system, and the OURRstore team, using the prepaid label that we provide them, ships that copy back to us.  We then store that copy offline in a locked, environmentally controlled location in Arkansas.  If the optional third copy is requested, the OURRstore system makes that third copy, which is ejected from the system and stored in an environmentally controlled location in Oklahoma.

Using OURRstore

Archiving data to OURRstore

Because of the need to collecting collect data into bundled, compressed archive files, we offer several options to assist users in archiving data.  In the simplest option, you just create a directory tree or bucket where you place the data to be archved, and the HPC admins take care of bundling up the data for OURRstore.  Or you can create the compressed archives yourself, and then ask the HPC Admins to transmit to OURRstore using the pooled account.  Power users who want complete control of the process are welcome to work directly with the OURRstore team to get trained in using OURRstore, get a private account, and  manage the archiving process themselves (not recommended, but possible).

Simple option for data on Grace (possibly the research NAS)

For the simple option, all you need to do is collect your data to be archived into a sub-directory tree with just the files to be archived.  Please move the sub-directory under a parent directory named "ToBeArchived" in your home directory.  Please name (or rename) the sub-directory tree to be archived with the current date in "yyyy-mm-dd" format, for example "/home/john/ToBeArchived/2021-08-07/".  The sub-sub-directory tree under the dated subdirectory can be organized any way you see fit.  Please use the "mv" command, not "cp" or "rsync" to collect data, since you eventually want the data to disappear from Grace once it is safely in OURRstore, and you don't want to run into a space crunch while organizing your archive directory.  (Remember, Grace's storage is only supposed to be a temporary holding place for running jobs.) 

...

This method can also be used for data on the research NAS, assuming that the HPC admins can get access to that storage area on the NAS.  For data on the NAS, just create the "ToBeArchive" directory somewhere in your storage area, and let the HPC admin team know the storage area and path to the directory when you submit the archive request.

Simple option for data on ROSS

For data on ROSS, the simplest way to archive that data is to collect the data to be archived into a bucket.  When ready for archiving, you could optionally alter the permissions on the bucket to read-only to minimize the chance of accidental changes, if you wish.  Then send a request via e-mail to hpcadmin@uams.edu giving them the name of the bucket to archived and the namespace that the bucket lives in.   

...

If you prefer to create your own compressed archive files ready for OURRstore you could simply create them and place them in a bucket.  In this case, when you e-mail hpcadmin@uams.edu the name of the bucket and namespace, let them know that you have already generated the compressed archive files in the bucket.  Or alternatively, you could used the "Slightly less simple option for data on Grace", temporarily placing copies of your archive files on Grace.

Slightly less simple option for data on Grace (possibly the research NAS)

If you would rather create the compressed tar files yourself, feel free to do so, and then collect the archive files into the top level of your "ToBeArchived" subdirectory in your home directory.  In this option, feel free to use any method of your choosing (e.g. tar, zip, or some custom format) that can collect the data you want archived into files.  The ideal choice should keep the archive files between 20 to 200 GB in length, though OURRstore will accept anything between 1 GB to 1 TB in length.  We encourage you to use compression and encryption for efficiency and safety, but that is your choice. If you do encrypt, please safeguard your encryption key, since no one but you likely knows it.  (The HPC admins would not know your encryption keys, for example.)  In this option you are responsible for creating your own manifests of the content of your archive files, if desired.  The names of the archive files must be globally unique.  In other words, do not name any 2 archive files with the same name.  They are all going into a single directory in OURRstore, so none of the names of any of the archive files that you create can clash with the name of any archive file that you previously created.  Otherwise you run the risk of losing the previous archive file (i.e. it could get overwritten).

...

This method can also be used for data on the research NAS, assuming that the HPC admins can get access to that storage area on the NAS.  For data on the NAS, just create the "ToBeArchive" directory somewhere in your storage area, and let the HPC admin team know the storage area and path to the directory when you send the archive request e-mail.

Complicated Option

To exercise the complicated option, you would need to approach the OURRstore team directly, sign agreements, and go through the mandatory training to get your own account on OURRstore.  You would then be responsible for following all of their rules, for purchasing your own media, for shipping to and from Oklahoma, for creating and tracking your own archive files, etc.  This option really is for the power user who wants full control over the process of archiving and retrieving data with only minimal or no assistance from the the HPC admin team.  This option is also appropriate for users whose data is on systems that the HPC Admins do not have access to.  While we do not encourage people to use this option due to the complications and responsibility of getting your own account on OURRstore, it is a possibility for those who prefer.  For more information, see OURRstore: OU & Regional Research Store

Retrieving Data from OURRstore

How to retrieve archived data depends on which of the above options you used to archive it.

Simple option for data archived from Grace (possibly the research NAS)

When the HPC Admins archived data for you, they left manifests of which files or objects are in what compressed tar file in your in your /home/archived directory.  You can search through those manifests (e.g. using grep) to find which archive file or files the data you are interested in is located.  Send a request via e-mail to hpcadmin@uams.edu with the name of the subdirectory tree that you would like retrieved and the name of the archive file or files it is in..  If you lost the manifests, don't fret.  The HPC Admins kept a backup copy and can help.   In the case of a lost manifest, still send a request via e-mail to hpcadmin@uams.edu with the name of the subdirectory tree that you would like retrieved and the approximate date that it was archived.  The HPC admins will do their best to find the archive file names in their manifest backup copies.  However, since yours is the primary copy, please do not lose it, as there is always a chance that the backup copy gets lost as well.

Once the HPC admins receive your e-mail, they will pull the pertinent archive files from OURRstore and restore the archived data to your /home/<username>/RestoredArchives directory.  In general, your data should be restored within 1 business/work day (i.e. things might not get restored on weekends and holidays).  The HPC admins will then notify you by e-mail that your data are restored.

Simple option for data archived from ROSS

Simply send a request via e-mail to hpcadmin@uams.edu with the name of the bucket that you want restored, and a name prefix of the objects that you want restored.  Leave the name prefix blank if you want the entire bucket restored.  You should also include the namespace where the bucket should be located.  

Once the HPC admins receive your request, they will restore the objects into the bucket in the namespace, and notify you when it is ready.

Slightly less simple option for data archived from Grace (possibly the research NAS)

In this option, since you created the archive files yourself, the HPC Admins did not create manifest files.  It is up to you to keep track of what data is in which archive file.  When you want to retrieve one of your archive files, send a request via e-mail to hpcadmin@uams.edu with the names of the archive files that you would like retrieved.  

...

Once you get notification that your archive files have been restored, you may then use whatever means you chose to pull data from those files.  Don't forget that you could have compressed and encrypted the archive files before you archived them.  Remember that the HPC Admins would not know the encryption key if you encrypted the files before archiving, and cannot help if you have lost it.  So do make certain when you created encrypted archive files that you safely store your encryption key.

Complicated Option

You are in complete control of your retrieval of archived data, since the archived data is on your OURRstore account.  The HPC admins are not involved.