Enterprise Java

Pagination with Couchbase

If you have to deal with a large number of documents when doing queries against a Couchbase cluster it is important to use pagination to get rows by page. You can find some information in the documentation in the chapter “Pagination“, but I want to go in more details and sample code in this article.

For this example I will start by creating a simple view based on the beer-sample dataset, the view is used to find brewery by country:
 
 
 
 

function (doc, meta) {
  if (doc.type == "brewery" && doc.country){
   emit(doc.country);
  } 
}

This view list all the breweries by country, the index looks like:

Doc idKeyValue
bersaglierArgentinanull
cervecera_jeromeArgentinanull
brouwerij_nacional_balashiArubanull
australian_brewing_corporationAustralianull
carlton_and_united_breweriesAustralianull
coopers_breweryAustralianull
foster_s_australia_ltdAustralianull
gold_coast_breweryAustralianull
lion_nathan_australia_hunter_streetAustralianull
little_creatures_breweryAustralianull
malt_shovel_breweryAustralianull
matilda_bay_brewingAustralianull
yellowstone_valley_brewingUnited Statesnull
yuengling_son_brewingUnited Statesnull
zea_rotisserie_and_breweryUnited Statesnull
fosters_tien_gangViet Namnull
hue_breweryViet Namnull

So now you want to navigate in this index with a page size of 5 rows.

Using skip / limit Parameters

The most simplistic approach is to use limit and skip parameters for example:

Page 1 : ?limit=5&skip0

Page 2: ?limit=5&skip=5

Page x: ?limit=5&skip(limit*(page-1))

You can obviously use any other parameters you need to do range or key queries (startkey/endkey, key, keys) and sort option (descending).

This is simple but not the most efficient way, since the query engine has to read all the rows that match the query, until the skip value is reached.

Some code sample in python that paginate using this view :

from couchbase import Couchbase
cb = Couchbase.connect(bucket='beer-sample')

hasRow = True
rowPerPage = 5
page = 0
currentStartkey=""
startDocId=""

while hasRow :
  hasRow = False
	skip = 0 if page == 0 else 1
	page = page + 1
	print "-- Page %s --" % (page)
	rows = cb.query("test", "by_country", limit=rowPerPage, skip=skip, startkey=currentStartkey, startkey_docid=startDocId)
	for row in rows:
		hasRow = True
		print "Country: \"%s\" \t Id: '%s'" % (row.key, row.docid)
		currentStartkey = row.key
		startDocId = row.docid
	print " -- -- -- -- \n"



This application loops on all the pages until the end of the index.

As I said before this is not the best approach since the system must read all the values until the skip is reached. The following example shows a better way to deal with this.

Using startkey / startkey_docid parameters

To make this pagination more efficient it is possible to take another approach. This approach uses the startkey and startkey_docid  to select the proper documents.

  • The startkey parameter will be the value of the key where the query should start to read (based on the last key of the “previous page”
  • Since for a key for example “Germany” you may have one or more ids (documents) it is necessary to say to Couchbase query engine where to start, for this you need to use the startkey_docid parameter, and ignore this id since it is the last one of the previous page.

So if we look at the index, and add a row number to explain the pagination

Row numDoc idKeyValue
Query for page 1
?limit=5
1bersaglierArgentinanull
2cervecera_jeromeArgentinanull
3brouwerij_nacional_balashiArubanull
4australian_brewing_corporationAustralianull
5carlton_and_united_breweriesAustralianull
Query for page 2
?limit=5&startkey=”Australia”&startkey_docid=carlton_and_united_breweries&skip=1
6coopers_breweryAustralianull
7foster_s_australia_ltdAustralianull
8gold_coast_breweryAustralianull
9lion_nathan_australia_hunter_streetAustralianull
10little_creatures_breweryAustralianull
Query for page 3

?limit=5&startkey=”Australia”&startkey_docid=little_creatures_brewery&skip=1
11malt_shovel_breweryAustralianull
12matilda_bay_brewingAustralianull
yellowstone_valley_brewingUnited Statesnull
yuengling_son_brewingUnited Statesnull
zea_rotisserie_and_breweryUnited Statesnull
fosters_tien_gangViet Namnull
hue_breweryViet Namnull

So as you can see in the examples above, the query uses the startkey, a document id, and just passes it using skip=1.

Let’s now look at the application code, once again in Python

from couchbase import Couchbase
cb = Couchbase.connect(bucket='beer-sample')

hasRow = True
rowPerPage = 5
page = 0
currentStartkey=""
startDocId=""

while hasRow :
	hasRow = False
	skip = 0 if page == 0 else 1
	page = page + 1
	print "-- Page %s --" % (page)
	rows = cb.query("test", "by_country", limit=rowPerPage, skip=skip, startkey=currentStartkey, startkey_docid=startDocId)
	for row in rows:
		hasRow = True
		print "Country: \"%s\" \t Id: '%s'" % (row.key, row.docid)
		currentStartkey = row.key
		startDocId = row.docid
	print " -- -- -- -- \n"

This application loops on all the pages until the end of the index

Using this approach, the application start to read the index at a specific key ( startkey parameter), and only loop on the necessary entry in the index. This is more efficient than using the simple skip approach.

Views with Reduce function

When your view is using a reduce function, and grouping, it is not possible to use the paramater startkey_docid since the id of the document is not available when your reduce the result.

So when you are using a reduce you must use the skip and limit parameters.

Couchbase Java SDK Paginator

In the previous examples, I have showed how to do pagination using the various query parameters. The Java SDK provides a Paginator object to help developers to deal with pagination. The following example is using the same view with the Paginator API.

package com.couchbase.devday;

import com.couchbase.client.CouchbaseClient;
import com.couchbase.client.protocol.views.*;
import java.net.URI;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.TimeUnit;
import java.util.logging.ConsoleHandler;
import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.Logger;

public class JavaPaginatorSample {

public static void main(String[] args) {

	configure();
	System.out.println("--------------------------------------------------------------------------");
	System.out.println("\tCouchbase - Paginator");
	System.out.println("--------------------------------------------------------------------------");

    List<URI> uris = new LinkedList<URI>();
    uris.add(URI.create("http://127.0.0.1:8091/pools"));

    CouchbaseClient cb = null;
    try {
    	cb = new CouchbaseClient(uris, "beer-sample", "");
	  	System.out.println("--------------------------------------------------------------------------");
	  	System.out.println("Breweries (by_name) with docs & JSON parsing");
		View view = cb.getView("test", "by_country");
		Query query = new Query();
		int docsPerPage = 5;

		Paginator paginatedQuery = cb.paginatedQuery(view, query, docsPerPage);
		int pageCount = 0;
		while(paginatedQuery.hasNext()) {
			pageCount++;
			System.out.println(" -- Page "+ pageCount +" -- ");
			ViewResponse response = paginatedQuery.next();
			for (ViewRow row : response) {
				System.out.println(row.getKey() + " : " + row.getId());
			}
			System.out.println(" -- -- -- ");
		}
		
		System.out.println("\n\n");
    	cb.shutdown(10, TimeUnit.SECONDS);
    } catch (Exception e) {
    	System.err.println("Error connecting to Couchbase: " + e.getMessage());
    }
}



private static void configure() {

	for(Handler h : Logger.getLogger("com.couchbase.client").getParent().getHandlers()) {
		if(h instanceof ConsoleHandler) {
			h.setLevel(Level.OFF);
		}
	}
	Properties systemProperties = System.getProperties();
	systemProperties.put("net.spy.log.LoggerImpl", "net.spy.memcached.compat.log.SunLogger");
	System.setProperties(systemProperties);

	Logger logger = Logger.getLogger("com.couchbase.client");
	logger.setLevel(Level.OFF);
	for(Handler h : logger.getParent().getHandlers()) {
		if(h instanceof ConsoleHandler){
			h.setLevel(Level.OFF);
		}
	}
}

}

So as you can see you can easily paginate on the results of a Query using the Java Paginator.

  • At the line #37, the Paginator is created from using the view and query objects and a page size is specified
  • Then you just need to use the hasNext() and next() methods to navigate in the results.

The Java Paginator  is aware of the fact that they query is using a reduce or not, so you can use it with all type of queries – Internally it will switch between the skip/limit approach and the doc_id approaches. You can see how it is done in the Paginator class.

Note that if you want to do that in a Web application between HTTP request you must keep the Paginator object in the user session since the current API keeps the current page in its state.

Conclusion

In this blog post you have  learned how to deal with pagination in Couchbase views; to summarize

  • The pagination is based on some specific parameters that you send when executing a query.
  • Java developers can use the Paginator class that simplifies pagination.

I am inviting you to look at the new Couchbase Query Language N1QL, still under development, that will provide more options to developers including pagination, using LIMIT & OFFSET parameters, for example:

SELECT fname, age 
    FROM tutorial
        WHERE age > 30
    LIMIT 2
    OFFSET 2

If you want to learn more about N1QL:

 

Reference: Pagination with Couchbase from our JCG partner Tugdual Grall at the Tug’s Blog blog.

Tugdual Grall

Tugdual Grall, an open source advocate and a passionate developer, is a Chief Technical Evangelist EMEA at MapR. He currently works with the European developer communities to ease MapR, Hadoop, and NoSQL adoption. Before joining MapR, Tug was Technical Evangelist at MongoDB and Couchbase. Tug has also worked as CTO at eXo Platform and JavaEE product manager, and software engineer at Oracle. Tugdual is Co-Founder of the Nantes JUG (Java User Group) that holds since 2008 monthly meeting about Java ecosystem. Tugdual also writes a blog available at http://tgrall.github.io/
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button