Home » Java » Enterprise Java » Restoring Cassandra Priam Backup With sstableloader

About Bozhidar Bozhanov

Bozhidar Bozhanov
Senior Java developer, one of the top stackoverflow users, fluent with Java and Java technology stacks - Spring, JPA, JavaEE, as well as Android, Scala and any framework you throw at him. creator of Computoser - an algorithmic music composer. Worked on telecom projects, e-government and large-scale online recruitment and navigation platforms.

Restoring Cassandra Priam Backup With sstableloader

I’ve previously written about setting up Cassandra and Priam for backup and cluster management. The example that I gave for backup restore there, however, is not applicable in every situation – it may not work on a completely separate cluster, for example. Or in case of partial restore to just one table, rather than the whole database.

In such cases you may choose to do a restore using the sstableloader utility. It has a straightforward syntax:

1
2
3
sudo sstableloader -d 172.35.1.2,172.35.1.3 -ts /etc/cassandra/conf/truststore.jks \
   -ks /etc/cassandra/conf/node.jks -f /etc/cassandra/conf/cassandra.yaml  \
    ~/keyspacename/table-0edcc420c19011e7a8c37656dd492a94

If you look at your Priam-generated backup, it looks like you can just copy the files (e.g. via s3 aws cp on AWS) for the particular tables and sstableloader import them. There’s a catch, however. In order to save space, Priam is using Snappy to compress all of the files. So if you try to feed them to any Cassandra utility, it will complain that they are corrupted.

So you have to decompress them before using sstableloader or anything else. But how? Well, Priam offers a service for that – you call it by passing the absolute path to a compressed file and the absolute path to where the uncompressed should be placed and it does the simple job of streaming the original through a decompressor. For decompressing an entire backup, I’ve written a python script. It assumes a certain structure, but you can parameterize it to make it more flexible. Here’s the code (excuse my non-idiomatic Python, I’m only using it for simple scripting):

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
#! /usr/bin/env python
# python script used to pass each backup file through the decompression facility of Priam (using Snappy)
# so that it can be used with sstableloader for restore
import os
import requests
 
rootdir = '/home/ec2-user/backup'
target = '/home/ec2-user/keyspace'
 
for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        fullpath = os.path.join(subdir, file)
        parent = os.path.join(fullpath, os.pardir)
        table = os.path.basename(os.path.abspath(parent))
        targetdir = target + "/" + table + "/"
        if not os.path.exists(targetdir):
            os.makedirs(targetdir)
 
        url = 'http://localhost:8080/Priam/REST/v1/cassadmin/decompress?in=' + fullpath + '&out=' + target + "/" + table + "/" + file
        print(url)
        requests.get(url)

Now you have decompressed backup files that you can restore using sstableloader. It may take some time if you have a lot of data, and you should not run the restore at the same time a snapshot backup is performed, as it may fail (was warned by the documentation).

And then, if you are lucky, everything will pass. Unfortunately, there are cases when it doesn’t. The tool is far from perfect, so for example if you’ve dropped a column, restoring an old sstable will fail, as it will try to insert into the missing column. That sounds like a big problem for actual production systems, and it has been reported, but not yet fixed. Sometimes a table may just fail to get restored, for unknown reasons (failure during streaming, alleged corrupted data). In those cases you may want to dump the sstables to JSON using sstabledump and then convert the JSON to CQL to insert it. Of course, there’s no tool to do that, so here’s one, written in Java. It’s not perfect and doesn’t support user-defined types, sets and maps. Note that this is probably not a great idea for huge tables, only for smaller ones.

As a general note in conclusion here, it’s very important to have backups but it’s much more important to be able to restore from them. A backup is useless if you don’t have a restore procedure. And simply having the tools available (e.g. Priam) doesn’t mean you can a restore procedure ready to execute. You should be doing test restores on active staging data as well as full restores on an empty, newly formed cluster, as there are different restore scenarios.

Published on Java Code Geeks with permission by Bozhidar Bozhanov, partner at our JCG program. See the original article here: Restoring Cassandra Priam Backup With sstableloader

Opinions expressed by Java Code Geeks contributors are their own.

(0 rating, 0 votes)
You need to be a registered member to rate this.
Start the discussion Views Tweet it!
Do you want to know how to develop your skillset to become a Java Rockstar?
Subscribe to our newsletter to start Rocking right now!
To get you started we give you our best selling eBooks for FREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to the Terms and Privacy Policy

Leave a Reply

avatar

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  Subscribe  
Notify of