Software Development

Geospatial (distance) faceting using Lucene’s dynamic range facets

There have been several recent, quiet improvements to Lucene that, taken together, have made it surprisingly simple to add geospatial distance faceting to any Lucene search application, for example:
 
 
 
 
 
 
 
 

< 1 km (147)
< 2 km (579)
< 5 km (2775)

Such distance facets, which allow the user to quickly filter their search results to those that are close to their location, has become especially important lately since most searches are now from mobile smartphones.

In the past, this has been challenging to implement because it’s so dynamic and so costly: the facet counts depend on each user’s location, and so cannot be cached and shared across users, and the underlying math for spatial distance is complex.

But several recent Lucene improvements now make this surprisingly simple!

First, Lucene’s dynamic range faceting has been generalized to accept any ValueSource, not just a numeric doc values field from the index. Thanks to the recently added expressions module, this means you can offer dynamic range facets computed from an arbitrary JavaScript expression, since the expression is compiled on-the-fly to a ValueSource using custom generated Java bytecodes with ASM. Lucene’s range faceting is also faster now, using segment trees to quickly assign each value to the matching ranges.

Second, the Haversine distance function was added to the expressions module. The implementation uses impressively fast approximations to the normally costly trigonometric functions, poached in part from the Java Optimized Development Kit project, without sacrificing too much accuracy. It’s unlikely the approximations will ever matter in practice, and there is an open issue to further improve the approximation.

Suddenly, armed with these improvements, if you index latitude and longitude as DoubleDocValuesFields in each document, and you know the user’s latitude/longitude location for each request, you can easily compute facet counts and offer drill-downs by any set of chosen distances.

First, index your documents with latitude/longitude fields:

Document doc = new Document();
doc.add(new DoubleField("latitude", 40.759011, Field.Store.NO));
doc.add(new DoubleField("longitude", -73.9844722, Field.Store.NO));
writer.addDocument(doc);

At search time, obtain the ValueSource by building a dynamic expression that invokes the Haversine function:

private ValueSource getDistanceValueSource() {
  Expression distance;
  try {
    distance = JavascriptCompiler.compile(
                 "haversin(40.7143528,-74.0059731,latitude,longitude)");
  } catch (ParseException pe) {
    // Should not happen
    throw new RuntimeException(pe);
  }
  SimpleBindings bindings = new SimpleBindings();
  bindings.add(new SortField("latitude", SortField.Type.DOUBLE));
  bindings.add(new SortField("longitude", SortField.Type.DOUBLE));

  return distance.getValueSource(bindings);
}

Instead of the hardwired latitude/longitude above, you should fill in the user’s location.

Using that ValueSource, compute the dynamic facet counts like this:

FacetsCollector fc = new FacetsCollector();

searcher.search(new MatchAllDocsQuery(), fc);

Facets facets = new DoubleRangeFacetCounts(
                    "field",
                    getDistanceValueSource(), fc,
                    ONE_KM,
                    TWO_KM,
                    FIVE_KM,
                    TEN_KM);

return facets.getTopChildren(10, "field");

Normally you’d use a “real” query instead of the top-level-browse MatchAllDocsQuery. Finally, once the user picks a distance for drill-down, use the Range.getFilter method and add that to a DrillDownQuery using ConstantScoreQuery:

public TopDocs drillDown(DoubleRange range) throws IOException {
  // Passing no baseQuery means we drill down on all
  // documents ("browse only"):
  DrillDownQuery q = new DrillDownQuery(null);

  q.add("field", new ConstantScoreQuery(
                     range.getFilter(getDistanceValueSource())));

  return searcher.search(q, 10);
}

See the full source code here, from the lucene/demo module.

When I first tested this example, there was a fun bug, and then later the facet APIs were overhauled, so you’ll need to wait for the Lucene 4.7 release, or just use the current the 4.x sources, to get this example working.

While this example is simple, and works correctly, there are some clear performance improvements that are possible, such as using a bounding box as a fast match to avoid computing Haversine for hits that are clearly outside of the range of possible drill-downs (patches welcome!). Even so, this is a nice step forward for Lucene’s faceting and it’s amazing that geospatial distance faceting with Lucene can be so simple.
 

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button