Enterprise Java

Apache Fluo: Implementation of Percolator Which Populates Google’s Search Index

Apache Fluo is an open source implementation of Percolator [2] (which populates Google’s search index) for Apache Accumulo [3]. With Fluo, users can continuously join new data into large existing data sets without reprocessing all data. Unlike batch and streaming frameworks, Fluo offers much lower latency and can operate on extremely large data sets [1].
 
 
 

Major Features

Reduced Latency

When combining new data with existing data, Fluo offers reduced latency when compared to batch processing frameworks (e.g Spark, MapReduce).

Reliable

Incremental updates are implemented using transactions which allow thousands of updates to happen concurrently without corrupting data.

Avoid Reprocessing Data

Combine new data with existing data without having to reprocess the entire dataset.

General Purpose

Fluo applications consist of a series of observers that execute user code when observed data is updated.

Core API

The core Fluo API supports simple, cross-node transactional updates using get/set methods.

Recipes API

The Fluo Recipes API builds on the core API to offer complex transactional updates.

Apache Fluo graduated from the Apache Incubator to become a Top-Level Project at Jul 2017. The learning curve of such technologies for newcomers is not easy. However, the community has created a tutorial for and a skeleton project for it. One can follow Fluo Tour [4] to learn how you could use Fluo. You can fork the source code from Apache Fluo GitHub repository [5]. Also, it has an active community and new contributors are usually mentioned on Twitter by @ApacheFluo.

Resources:

[1] https://fluo.apache.org

[2] research.google.com/pubs/pub36726

[3] accumulo.apache.org

[4] https://fluo.apache.org/tour

[5] https://github.com/apache/fluo

Published on Java Code Geeks with permission by Furkan Kamaci, partner at our JCG program. See the original article here: Apache Fluo: Implementation of Percolator Which Populates Google’s Search Index

Opinions expressed by Java Code Geeks contributors are their own.

Furkan Kamaci

Furkan KAMACI is a Machine Learning, NLP and Search Expert who loves Java! He works at Alcatel - Lucent as Integration Professional.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button