Home » Software Development » Geospatial (distance) faceting using Lucene’s dynamic range facets

About Michael McCandless

Geospatial (distance) faceting using Lucene’s dynamic range facets

There have been several recent, quiet improvements to Lucene that, taken together, have made it surprisingly simple to add geospatial distance faceting to any Lucene search application, for example:
 
 
 
 
 
 
 
 

< 1 km (147)
< 2 km (579)
< 5 km (2775)

Such distance facets, which allow the user to quickly filter their search results to those that are close to their location, has become especially important lately since most searches are now from mobile smartphones.

In the past, this has been challenging to implement because it’s so dynamic and so costly: the facet counts depend on each user’s location, and so cannot be cached and shared across users, and the underlying math for spatial distance is complex.

But several recent Lucene improvements now make this surprisingly simple!

First, Lucene’s dynamic range faceting has been generalized to accept any ValueSource, not just a numeric doc values field from the index. Thanks to the recently added expressions module, this means you can offer dynamic range facets computed from an arbitrary JavaScript expression, since the expression is compiled on-the-fly to a ValueSource using custom generated Java bytecodes with ASM. Lucene’s range faceting is also faster now, using segment trees to quickly assign each value to the matching ranges.

Second, the Haversine distance function was added to the expressions module. The implementation uses impressively fast approximations to the normally costly trigonometric functions, poached in part from the Java Optimized Development Kit project, without sacrificing too much accuracy. It’s unlikely the approximations will ever matter in practice, and there is an open issue to further improve the approximation.

Suddenly, armed with these improvements, if you index latitude and longitude as DoubleDocValuesFields in each document, and you know the user’s latitude/longitude location for each request, you can easily compute facet counts and offer drill-downs by any set of chosen distances.

First, index your documents with latitude/longitude fields:

Document doc = new Document();
doc.add(new DoubleField("latitude", 40.759011, Field.Store.NO));
doc.add(new DoubleField("longitude", -73.9844722, Field.Store.NO));
writer.addDocument(doc);

At search time, obtain the ValueSource by building a dynamic expression that invokes the Haversine function:

private ValueSource getDistanceValueSource() {
  Expression distance;
  try {
    distance = JavascriptCompiler.compile(
                 "haversin(40.7143528,-74.0059731,latitude,longitude)");
  } catch (ParseException pe) {
    // Should not happen
    throw new RuntimeException(pe);
  }
  SimpleBindings bindings = new SimpleBindings();
  bindings.add(new SortField("latitude", SortField.Type.DOUBLE));
  bindings.add(new SortField("longitude", SortField.Type.DOUBLE));

  return distance.getValueSource(bindings);
}

Instead of the hardwired latitude/longitude above, you should fill in the user’s location.

Using that ValueSource, compute the dynamic facet counts like this:

FacetsCollector fc = new FacetsCollector();

searcher.search(new MatchAllDocsQuery(), fc);

Facets facets = new DoubleRangeFacetCounts(
                    "field",
                    getDistanceValueSource(), fc,
                    ONE_KM,
                    TWO_KM,
                    FIVE_KM,
                    TEN_KM);

return facets.getTopChildren(10, "field");

Normally you’d use a “real” query instead of the top-level-browse MatchAllDocsQuery. Finally, once the user picks a distance for drill-down, use the Range.getFilter method and add that to a DrillDownQuery using ConstantScoreQuery:

public TopDocs drillDown(DoubleRange range) throws IOException {
  // Passing no baseQuery means we drill down on all
  // documents ("browse only"):
  DrillDownQuery q = new DrillDownQuery(null);

  q.add("field", new ConstantScoreQuery(
                     range.getFilter(getDistanceValueSource())));

  return searcher.search(q, 10);
}

See the full source code here, from the lucene/demo module.

When I first tested this example, there was a fun bug, and then later the facet APIs were overhauled, so you’ll need to wait for the Lucene 4.7 release, or just use the current the 4.x sources, to get this example working.

While this example is simple, and works correctly, there are some clear performance improvements that are possible, such as using a bounding box as a fast match to avoid computing Haversine for hits that are clearly outside of the range of possible drill-downs (patches welcome!). Even so, this is a nice step forward for Lucene’s faceting and it’s amazing that geospatial distance faceting with Lucene can be so simple.
 

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!

1. JPA Mini Book

2. JVM Troubleshooting Guide

3. JUnit Tutorial for Unit Testing

4. Java Annotations Tutorial

5. Java Interview Questions

6. Spring Interview Questions

7. Android UI Design

and many more ....

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*


− four = 3

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Want to take your Java Skills to the next level?
Grab our programming books for FREE!
  • Save time by leveraging our field-tested solutions to common problems.
  • The books cover a wide range of topics, from JPA and JUnit, to JMeter and Android.
  • Each book comes as a standalone guide (with source code provided), so that you use it as reference.
Last Step ...

Where should we send the free eBooks?

Good Work!
To download the books, please verify your email address by following the instructions found on the email we just sent you.