Home » Author Archives: Michael McCandless

Author Archives: Michael McCandless

Apache Lucene 5.0.0 is coming!

apache-lucene-logo

At long last, after a strong series of 4.x feature releases, most recently 4.10.2, we are finally working towards another major Apache Lucene release! There are no promises for the exact timing (it’s done when it’s done!), but we already have a volunteer release manager (thank you Anshum!). A major release in Lucene means all deprecated APIs (as of 4.10.x) ...

Read More »

A new proximity query for Lucene, using automatons

apache-lucene-logo

The simplest Apache Lucene query, TermQuery, matches any document that contains the specified term, regardless of where the term occurs inside each document. Using BooleanQuery you can combine multiple TermQuerys, with full control over which terms are optional (SHOULD) and which are required (MUST) or required not to be present (MUST_NOT), but still the matching ignores the relative positions of ...

Read More »

Choosing a fast unique identifier (UUID) for Lucene

apache-lucene-logo

Most search applications using Apache Lucene assign a unique id, or primary key, to each indexed document. While Lucene itself does not require this (it could care less!), the application usually needs it to later replace, delete or retrieve that one document by its external id. Most servers built on top of Lucene, such as Elasticsearch and Solr, require a ...

Read More »

Testing Lucene’s index durability after crash or power loss

apache-lucene-logo

One of Lucene’s useful transactional features is index durability which ensures that, once you successfully call IndexWriter.commit, even if the OS or JVM crashes or power is lost, or you kill -KILL your JVM process, after rebooting, the index will be intact (not corrupt) and will reflect the last successful commit before the crash. Of course, this only works if ...

Read More »

Using Lucene’s search server to search Jira issues

apache-lucene-logo

You may remember my first blog post describing how the Lucene developers eat our own dog food by using a Lucene search application to find our Jira issues. That application has become a powerful showcase of a number of modern Lucene features such as drill sideways and dynamic range faceting, a new suggester based on infix matches, postings highlighter, block-join ...

Read More »

Finding long tail suggestions using Lucene’s new FreeTextSuggester

apache-lucene-logo

Lucene’s suggest module offers a number of fun auto-suggest implementations to give a user live search suggestions as they type each character into a search box. For example, WFSTCompletionLookup compiles all suggestions and their weights into a compact Finite State Transducer, enabling fast prefix lookup for basic suggestions. AnalyzingSuggester improves on this by using an Analyzer to normalize both the ...

Read More »

Three exciting Lucene features in one day

apache-lucene-logo

Yesterday was a productive day: suddenly, there are three exciting new features coming to Lucene. Expressions module The first feature, committed yesterday, is the new expressions module. This allows you to define a dynamic field for sorting, using an arbitrary String expression. There is builtin support for parsing JavaScript, but the parser is pluggable if you want to create your ...

Read More »

Screaming fast Lucene searches using C++ via JNI

apache-lucene-logo

At the end of the day, when Lucene executes a query, after the initial setup the true hot-spot is usually rather basic code that decodes sequential blocks of integer docIDs, term frequencies and positions, matches them (e.g. taking union or intersection for BooleanQuery), computes a score for each hit and finally saves the hit if it’s competitive, during collection. Even ...

Read More »

Transactional Lucene

apache-lucene-logo

Many users don’t appreciate the transactional semantics of Lucene’s APIs and how this can be useful in search applications. For starters, Lucene implements ACID properties: Atomicity: when you make changes (adding, removing documents) in an IndexWriter session, and then commit, either all (if the commit succeeds) or none (if the commit fails) of your changes will be visible, never something ...

Read More »
Do you want to know how to develop your skillset and become a ...

Subscribe to our newsletter to start Rocking right now!

To get you started we give you our best selling eBooks for FREE!
Get ready to Rock!
To download the books, please verify your email address by following the instructions found on the email we just sent you.

THANK YOU!

Close