ElasticSearch Tutorial for Beginners

Shubham AggarwalMarch 7th, 2018Last Updated: December 11th, 2023

14 2,225 8 minutes read

1. Introduction

In this example, we shall demonstrate how to make use of Elasticsearch, a distributed free-text search and analysis database engine based on Apache Lucene with a simple maven-based Java client.

We will be using the latest version of Elasticsearch, which is ES v6.1.2 while writing this post. For this example we use the following technologies:

Maven 3
Java 8
Elasticsearch 6.1.2

Elasticsearch is very well known due to its capability of communication over RESTful APIs. This means that we will be using APIs to interact with the database along with the HTTP methods like GET, POST, PUT and DELETE. It is a highly scalable distributed database which provides an excellent implementation with Apache Lucene. Some more features about Elasticsearch are:

With the total dependency size of only around 300 KB, Elasticsearch is very lightweight
Elasticsearch is focused solely on the performance of the queries. This means that whatever operations are done with the database, they are highly optimised and scalable
It is a highly fault-tolerant system. If a single Elasticsearch node dies in a cluster, the master server is very quick in identifying the issue and routes the incoming requests to a new node as fast as possible
Elasticsearch’s speciality lies in indexable text data which can be searched on the basis of tokens and filters

Although Elasticsearch is a great candidate when it comes to distributed free-text search and analysis engine, it might not be the best-suited database when it comes to doing some other operations like:

Counting operations like total and average
Executing transactional queries with rollbacks
Managing records which will be unique across multiple given terms

This means that Elasticsearch is a highly use-case based database but is an excellent one when it comes to its own domains.

2. Prerequisites

You must have installed Java on your computer in order to proceed because maven is a Java tool. You can download Java here.

Once you have Java installed on your system, you must install maven. You can download Maven from here.

Finally, you need to install Elasticsearch. You can download it from here and follow the steps for your OS. Note that we will be using v6.1.2 for this lesson. Other versions might now work exactly the same way. You can verify that ES is running by opening this URL in your browser:

localhost:9200

You should get a response like:

{
  "name": "wKUxRAO",
  "cluster_name": "elasticsearch",
  "cluster_uuid": "gvBXz7xsS5W4zlZuiADelw",
  "version": {
    "number": "6.1.2",
    "build_hash": "5b1fea5",
    "build_date": "2018-01-10T02:35:59.208Z",
    "build_snapshot": false,
    "lucene_version": "7.1.0",
    "minimum_wire_compatibility_version": "5.6.0",
    "minimum_index_compatibility_version": "5.0.0"
  },
  "tagline": "You Know, for Search"
}

Note that the elasticsearch is the default cluster name in Elasticsearch.

3. Project Setup

We will be using one of the many Maven archetypes to create a sample project for our example. To create the project execute the following command in a directory that you will use as workspace:

mvn archetype:generate -DgroupId=com.javacodegeeks.example -DartifactId=jcg-elasticsearch-example -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

If you are running maven for the first time, it will take a few seconds to accomplish the generate command because maven has to download all the required plugins and artifacts in order to make the generation task.

Notice that now, you will have a new directory with the same name as the artifactId inside the chosen directory. Now, feel free to open the project in your favourite IDE.

4. Maven Dependencies

To start with, we need to add appropriate Maven dependencies to our project. We will add the following dependency to our pom.xml file:

pom.xml

<properties>
  <elasticsearch.version>6.1.2</elasticsearch.version>
  <jackson.version>2.9.4</jackson.version>
  <java.version>1.8</java.version>
</properties>

<dependencies>
  <dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>${elasticsearch.version}</version>
  </dependency>

  <dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>${elasticsearch.version}</version>
  </dependency>

  <dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>${jackson.version}</version>
  </dependency>

  <dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>${jackson.version}</version>
  </dependency>

  <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>3.8.1</version>
    <scope>test</scope>
  </dependency>
</dependencies>

Find the latest Elasticsearch dependency here.

Note that we used Jackson only as the standard JSON library for Java in our code.

5. Making Database Queries

Now, we’re ready to start building our project and add more components to it.

5.1 Making a Model

We will start by adding a very simple model in our project, a Person. Its definition will be very standard, like:

Person.java

public class Person {

    private String personId;
    private String name;

    //standard getters and setters

    @Override
    public String toString() {
        return String.format("Person{personId='%s', name='%s'}", 
            personId, name);
    }
}

We omitted standard getters and setters for brevity but they are necessary to be made as Jackson uses them during Serialization and Deserialization of an Object.

5.2 Defining Connection parameters

We will use default connection parameters for making a connection with Elasticsearch. By default, ES uses two ports: 9200 and 9201.

Connection Parameters

//The config parameters for the connection
private static final String HOST = "localhost";
private static final int PORT_ONE = 9200;
private static final int PORT_TWO = 9201;
private static final String SCHEME = "http";

private static RestHighLevelClient restHighLevelClient;
private static ObjectMapper objectMapper = new ObjectMapper();

private static final String INDEX = "persondata";
private static final String TYPE = "person";

Apart from connection configuration params, we also defined the index params above to identify where our Person data is saved.

As mentioned in parameters above, Elasticsearch uses two ports, 9200 and 9201. The first port, 9200 is used by the Elasticsearch Query Server with which we can query the database directly through the RESTful APIs. The second port, 9201 is used by the REST server with which external clients can connect and perform operations.

5.3 Making a connection

We will make a method to establish the connection with the Elasticsearch Database. While making a connection to the Database, we must provide both the ports because only this way, our application will be able to connect to Elasticsearch server and we will be able to perform the database operations. Here is the code to make a connection:

Singleton method for getting Connection Object

/**
 * Implemented Singleton pattern here
 * so that there is just one connection at a time.
 * @return RestHighLevelClient
 */
private static synchronized RestHighLevelClient makeConnection() {

    if(restHighLevelClient == null) {
        restHighLevelClient = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost(HOST, PORT_ONE, SCHEME),
                        new HttpHost(HOST, PORT_TWO, SCHEME)));
    }

    return restHighLevelClient;
}

Note that we have implemented Singleton Design pattern here so that multiple connections aren’t made for ES which saves a lot of memory.

Due to the presence of RestHighLevelClient, the connection to Elasticsearch is thread-safe. The best time to initialise this connection will be at application request or when the first request is made to the client. Once this connection client is initialised, it can be used to perform any supported APIs.

5.4 Closing a connection

Just like in older versions of Elasticsearch, we used TransportClient and we closed it once we were done with our queries, it is also necessary to close a connection once the Database interaction is complete with RestHighLevelClient as well. Here is how this can be done:

Close Connection

private static synchronized void closeConnection() throws IOException {
    restHighLevelClient.close();
    restHighLevelClient = null;
}

We assigned null to RestHighLevelClient object as well so that Singleton pattern can stay consistent.

5.5 Inserting Data

We can insert data into the Database by converting the keys and values to a Hashmap. ES Database only accepts values in the form of a HashMap. Let’s see the code snippet on how this can be achieved:

POST Query

private static Person insertPerson(Person person){
    person.setPersonId(UUID.randomUUID().toString());
    Map<String, Object> dataMap = new HashMap<String, Object>();
    dataMap.put("personId", person.getPersonId());
    dataMap.put("name", person.getName());
    IndexRequest indexRequest = new IndexRequest(INDEX, TYPE, person.getPersonId())
            .source(dataMap);
    try {
        IndexResponse response = restHighLevelClient.index(indexRequest);
    } catch(ElasticsearchException e) {
        e.getDetailedMessage();
    } catch (java.io.IOException ex){
        ex.getLocalizedMessage();
    }
    return person;
}

Above, we used Java’s UUID class to create a unique identifier of the object as well. This way, we can control how the identifiers of an object are made.

5.6 Making a GET request

Once we are done with inserting data into the Database, we can confirm the operation by making a GET request to the Elasticsearch Database server. Let’s see the code snippet on how this can be done:

GET Query

private static Person getPersonById(String id){
    GetRequest getPersonRequest = new GetRequest(INDEX, TYPE, id);
    GetResponse getResponse = null;
    try {
        getResponse = restHighLevelClient.get(getPersonRequest);
    } catch (java.io.IOException e){
        e.getLocalizedMessage();
    }
    return getResponse != null ?
            objectMapper.convertValue(getResponse.getSourceAsMap(), Person.class) : null;
}

In this query, we just provided the main information about the object with which it can be identified, i.e., the Index, the Type and its unique identifier. Also, what we get back is actually a Map of values, as expressed by this expression:

Getting Map

getResponse.getSourceAsMap()

It is actually the Jackson’s objectMapper which is used to convert this Map to a POJO Object which can be easily used in our program and this way, we don’t have to each key form the Map, which will be a tedious process when you can simply have a POJO object.

5.7 Updating Data

We can make an Update request to Elasticsearch easily by first identifying the resource with its Index, Type and unique identifier. Then we can use a new HashMap object to update any number of values in the Object. Here is an example code snippet:

PUT Query

private static Person updatePersonById(String id, Person person){
    UpdateRequest updateRequest = new UpdateRequest(INDEX, TYPE, id)
            .fetchSource(true);    // Fetch Object after its update
    try {
        String personJson = objectMapper.writeValueAsString(person);
        updateRequest.doc(personJson, XContentType.JSON);
        UpdateResponse updateResponse = restHighLevelClient.update(updateRequest);
        return objectMapper.convertValue(updateResponse.getGetResult().sourceAsMap(), Person.class);
    }catch (JsonProcessingException e){
        e.getMessage();
    } catch (java.io.IOException e){
        e.getLocalizedMessage();
    }
    System.out.println("Unable to update person");
    return null;
}

Notice what we did above in the following statement:

PUT Query

updateRequest.doc(personJson, XContentType.JSON);

Here, we didn’t passed any specific property of the object which needs to be updated, instead, we passed complete Object JSON which will replace every key present for that Object.

We also checked for any possible errors through the catch statements. In a real-world application, you will want to handle these errors gracefully and make documented logs.

5.8 Deleting Data

Finally, we can delete data by simply identifying the resource with its Index, Type and unique identifier. Let’s see the code snippet on how this can be done:

DELETE Query

private static void deletePersonById(String id) {
    DeleteRequest deleteRequest = new DeleteRequest(INDEX, TYPE, id);
    try {
        DeleteResponse deleteResponse = restHighLevelClient.delete(deleteRequest);
    } catch (java.io.IOException e){
        e.getLocalizedMessage();
    }
}

Again, in the DELETE Query above, we just mentioned how we can identify an object.

5.9 Running the application

Let’s try our application by performing all the operations we mentioned above. As this is a plain Java application, we will call each of these methods and print the operation results:

main() method

public static void main(String[] args) throws IOException {

    makeConnection();

    System.out.println("Inserting a new Person with name Shubham...");
    Person person = new Person();
    person.setName("Shubham");
    person = insertPerson(person);
    System.out.println("Person inserted --> " + person);

    System.out.println("Changing name to `Shubham Aggarwal`...");
    person.setName("Shubham Aggarwal");
    updatePersonById(person.getPersonId(), person);
    System.out.println("Person updated  --> " + person);

    System.out.println("Getting Shubham...");
    Person personFromDB = getPersonById(person.getPersonId());
    System.out.println("Person from DB  --> " + personFromDB);

    System.out.println("Deleting Shubham...");
    deletePersonById(personFromDB.getPersonId());
    System.out.println("Person Deleted");

    closeConnection();
}

Once we run this application with the code, we will get the following output:

Program Output

Inserting a new Person with name Shubham...
Person inserted --> Person{personId='bfc5ba80-832a-4925-9b8d-525a4e420cb0', name='Shubham'}
Changing name to `Shubham Aggarwal`...
Unable to update person
Person updated --> Person{personId='bfc5ba80-832a-4925-9b8d-525a4e420cb0', name='Shubham Aggarwal'}
Getting Shubham...
Person from DB -->Person{personId='bfc5ba80-832a-4925-9b8d-525a4e420cb0', name='Shubham Aggarwal'}
Deleting Shubham...
Person Deleted

Of course, the IDs can vary. Note that we closed the connection after we are done with the queries. This helps JVM to claim back the memory which was held by the ES connection.

6. Conclusion

In this lesson, we studied how we can use Elasticsearch along with a plain Java client which uses a REST client. Choosing to use the REST client for making it usable in a real-world application needs to be explored with a scalable example. It is a choice we need to make while we start architecting an application.

Explore much more about Elasticsearch in our Elasticsearch course.

7. Download the Complete Source Code

This was a tutorial on ElasticSearch REST client and queries with Java where we interacted with the Elasticsearch Database via the RESTful operations.

Download
You can download the full source code of this example here: Elasticsearch Example