I’ve previously written about setting up Cassandra and Priam for backup and cluster management. The example that I gave for backup restore there, however, is not applicable in every situation – it may not work on a completely separate cluster, for example. Or in case of partial restore to just one table, rather than the whole database.
In such cases you may choose to do a restore using the sstableloader utility. It has a straightforward syntax:
If you look at your Priam-generated backup, it looks like you can just copy the files (e.g. via s3 aws cp on AWS) for the particular tables and sstableloader import them. There’s a catch, however. In order to save space, Priam is using Snappy to compress all of the files. So if you try to feed them to any Cassandra utility, it will complain that they are corrupted.
So you have to decompress them before using sstableloader or anything else. But how? Well, Priam offers a service for that – you call it by passing the absolute path to a compressed file and the absolute path to where the uncompressed should be placed and it does the simple job of streaming the original through a decompressor. For decompressing an entire backup, I’ve written a python script. It assumes a certain structure, but you can parameterize it to make it more flexible. Here’s the code (excuse my non-idiomatic Python, I’m only using it for simple scripting):
Now you have decompressed backup files that you can restore using sstableloader. It may take some time if you have a lot of data, and you should not run the restore at the same time a snapshot backup is performed, as it may fail (was warned by the documentation).
And then, if you are lucky, everything will pass. Unfortunately, there are cases when it doesn’t. The tool is far from perfect, so for example if you’ve dropped a column, restoring an old sstable will fail, as it will try to insert into the missing column. That sounds like a big problem for actual production systems, and it has been reported, but not yet fixed. Sometimes a table may just fail to get restored, for unknown reasons (failure during streaming, alleged corrupted data). In those cases you may want to dump the sstables to JSON using sstabledump and then convert the JSON to CQL to insert it. Of course, there’s no tool to do that, so here’s one, written in Java. It’s not perfect and doesn’t support user-defined types, sets and maps. Note that this is probably not a great idea for huge tables, only for smaller ones.
As a general note in conclusion here, it’s very important to have backups but it’s much more important to be able to restore from them. A backup is useless if you don’t have a restore procedure. And simply having the tools available (e.g. Priam) doesn’t mean you can a restore procedure ready to execute. You should be doing test restores on active staging data as well as full restores on an empty, newly formed cluster, as there are different restore scenarios.
Published on Java Code Geeks with permission by Bozhidar Bozhanov, partner at our JCG program. See the original article here: Restoring Cassandra Priam Backup With sstableloader
Opinions expressed by Java Code Geeks contributors are their own.