Archiving data to AWS Glacier

Pascal AlmaJune 3rd, 2013Last Updated: June 3rd, 2013

0 126 2 minutes read

For archiving your data at AWS you can use AWS Glacier. This service offers cheaper storage than the S3 service but the downside is that your data won’t be accessible right away as it is with S3. It may take a few hours to restore the data but if you are using this service for real archiving purposes this shouldn’t be a real issue. Lets setup a Glacier Vault by using the Management Console. Select the Glacier service in the Management Console:

Click ‘Create Vault’ and give it a name:

Set up a new SNS topic so we can receive messages when archiving actions are finished.

If all went well the Vault is created:

Together with the vault also a new SNS is created. To receive the messages posted on the SNS lets subscribe to it. Select the SNS service in the console:

Select the SNS topic that is just created:

Now click on the button to create a subscription to this topic. I simply create an email subscription so every message posted on the topic is send to the supplied email address:

The Glacier service in the Management Console doesn’t come with the functionality to archive/restore files to and from Glacier (like it does with S3). But there is of course an API to use and several SDK’s to transfer files. And of course there is the community which jumped into this and created a lot of tools both GUI and CLI based. I just picked one of them and installed it.
After installing and configuring the Glacier-cli I can check the created vault is found:

pascal$ ./glacier.py --region eu-west-1 vault list
PascalBackuPVault

Lets put some data into the vault. I use the example of the CLI tool as inspiration. Create a local file:

pascal$ echo 42 > example.txt

Upload the file into the vault created earlier:

pascal$ ./glacier.py --region eu-west-1 archive upload PascalBackuPVault example.txt

Lets check the file is in the vault:

pascal$ ./glacier.py --region eu-west-1 archive list PascalBackuPVault
example.txt

Now remove the local file and restore the archived one:

pascal$ rm example.txt
pascal$ ./glacier.py --region eu-west-1 archive retrieve PascalBackuPVault example.txt
glacier: queued retrieval job for archive 'example.txt'
pascal$ ./glacier.py --region eu-west-1 archive retrieve PascalBackuPVault example.txt
glacier: job still pending for archive 'example.txt'
pascal$ ./glacier.py --region eu-west-1 job list
a/p 2013-05-20T18:40:25.107Z PascalBackuPVault example.txt
pascal$ ./glacier.py --region eu-west-1 archive retrieve --wait PascalBackuPVault example.txt

After a few hours the job is finished and we can access the local file ‘example.txt’ again.

pascal$ cat example.txt
42

And in the mail I find the following notification telling me the archive file is ready for retrieval:

{"Action":"ArchiveRetrieval"
,"ArchiveId":"CggVcXvaWKfRn5tDR_UKna0GsYyXyZzlALPvjEFkcLdRq4NRBXra36m7hBOJSNCbOmEkQ04VoyTQyMt_
pXdrNggms13e3vjUqwW3tZwps8BiA1gprQQZyUQPDwwWkuKAFZoqahzA-g"
,"ArchiveSHA256TreeHash":
"084c799cd551dd332d5c5f9a5d593b2e931f5e36122ee5c793c1d08a19839cc0"
,"ArchiveSizeInBytes":3
,"Completed":true
,"CompletionDate":"2013-05-20T22:40:31.040Z"
,"CreationDate":"2013-05-20T18:40:25.107Z"
,"InventorySizeInBytes":null
,"JobDescription":null
,"JobId":"rxRUKT0QVWyOEMu4VYW_zrhXXYZC0ZrVo63sCtQJDBpFyhO-pPRJ7Z_Af02Hvn-bge-yGrKzRw78xG9d-Nvxjv2LcQho"
,"RetrievalByteRange":"0-2"
,"SHA256TreeHash":
"084c799cd551dd1d6e535f9a5d593b2e931f5e36122ee5c793c1d08a19839cc0"
,"SNSTopic":null
,"StatusCode":"Succeeded"
,"StatusMessage":"Succeeded"
,"VaultARN":"arn:aws:glacier:eu-west-1:678658091597:vaults/PascalBackuPVault"}
--
...

There is of course a lot more to tell about AWS Glacier. This page will be a good next step to get familiair with Glacier.

Reference: Archiving data to AWS Glacier from our JCG partner Pascal Alma at the The Pragmatic Integrator blog.