Enterprise Java

SOLR cloud 7.4 cluster configuration with an external Zookeeper ensemble and using SOLRJ API to access the data

SOLR is one of the most popular and highly scalable search engine which runs on the distributed indexing technology. Solr indexes can be built pretty much on top of data from any kind of data source- CSV data or XML data or data extracted from an RDBMS database or a standard file system.

For any web application which is built on RDBMS database as the backend, if a search needs to be performed on a table with millions of rows or if a query needs to be executed which joins multiple tables, it might take a large amount of time to get the response. Such kind of backend services make the website extremely slow. SOLR indexing can be a useful solution in these cases. SOLR can store the data in the form of reverse index documents containing multiple fields, each with a name and value. A single instance of SOLR is usually sufficient for small to medium sized databases. In case of large databases where queries need to be executed on billions of rows, it needs a distributed indexing solution where the indexes needs be distributed in multiple shards and clusters. SOLR cloud is designed for this purpose. But managing SOLR cloud’s nodes, shards and replicas is a huge task that cannot be done manually. Pairing with an external zookeeper cluster can help in SOLR cloud management by routing the queries to right solr instance along with the other benefits like load balancing and fault tolerance.

However, setting up a SOLR cloud cluster with an external Zookeeper ensemble is quite complex and might appear to be a daunting task for the developers. In this article, we are going to discuss the solr cloud setup and implementation with Zookeeper cluster in simple steps along with necessary code snippets and screenshots. We are going to create multiple shards of SOLR and operate them through zookeeper. Later the setup is tested through a spring boot micro service using SOLRJ APIs. SOLRJ is an API which helps java applications to communicate with SOLR and execute the queries. I have used Java 8 for JDK and Eclipse as the IDE in the example shown below.

1. Zookeeper setup

Here are the step by step instructions to setup zookeeper ensemble :

    • Download the latest Zookeeper from the URL https://zookeeper.apache.org/releases.html
    • Copy the download folders to Dev location for Solr and Zookeeper. In my case I have uploaded it to my dev server at the path /opt/user_projects/poc/solrpoc
    • Once we have Zookeeper downloaded, navigate to the conf folder under the path. In this article we are creating 3 instances of zookeeper on the same server. In the real world, these 3 instances run on 3 different servers.
    • Navigate to /opt/user_projects/poc/solrpoc/zookeeper-3.4.12/conf/, and add 3 conf files (zoo.conf,zoo2.conf,zoo3.conf).

SOLR cloud - Zookeeper config
Zookeeper config

    • In each conf file, update the dataDir location as
      dataDir=/opt/user_projects/poc/tmp/1 (this should be a different sequence number for all three conf files).
    • In each conf file , enter the server and port information of the 3 zookeeper instances like the following :
server.1=YourServerName:2888:3888
server.2= YourServerName:2889:3889
server.3= YourServerName:2890:3890
  • Create 3 folders at the respective locations mentioned in the dataDir property in the conf files above. (/opt/user_projects/poc/tmp/1 , /opt/user_projects/poc/tmp/2, /opt/user_projects/poc/tmp/3 ).
  • In each of those folders created, make a new file and name it ‘myid’ and enter the sequence number (1 or 2 or 3) as per the folder name.
  • With that the Zookeeper configuration is done.

2. SOLR Cloud setup

Now let’s start the Solr cloud configuration.

  • Download the latest Solr from the URL http://lucene.apache.org/solr/downloads.html
  • Navigate to the server directory under solr installation folder and create 4 solr folders in it. In my case, it is /opt/user_projects/poc/solrpoc/solr-7.4.0/server': solr, solr2, solr3, solr4 as shown in the image below.

SOLR cloud - Solr Config
Solr Config

  • Each of solr folder which is created above should have solr.xml and the port has to be assigned in that file as shown below.
    ${jetty.port:8993}
  • Also you should have a configsets in the same folder. Which should have data_driven_schema_configs if you want to use database.
  • After modifying the ports. Solr setup is pretty much ready.

3. Start the Zookeeper

  • Make sure you had set JAVA_HOME before starting Zookeeper.
  • Prepare start and stop scripts for Zookeeper and place them at /opt/user_projects/poc/solrpoc/zookeeper-3.4.12/startZookeeper.sh and stopZookeeper.sh as shown below.

SOLR start script

#!/bin/sh
echo "-----------------------------------"
echo "Starting all Solr Instances"

source /opt/sun_jdk/jdkversion/jdkversion.conf

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr -p 8993 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr2 -p 8994 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr3 -p 8995 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

bin/solr start -Duser.timezone="America/Los_Angeles" -c -s server/solr4 -p 8996 -z yourServer:8997,yourServer:8998,yourServer:8999 -noprompt

echo ""
echo "Started all Solr Instances"
echo "---------------------------------"

  • Prepare start and stop scripts for Solr and place them at /opt/user_projects/poc/solrpoc/solr-7.4.0/startSolr.sh
    When you execute this script, solr starts running on the ports specified.

SOLR cloud - Solr Console
Solr Console

4. Setting up Collections

    • Once SOLR is running, make sure you have the database related jar copied to dist and mentioned that dependency in solrconfig.xml. In this case we are using ojdbc14.jar at /opt/user_projects/poc/solrpoc/solr-7.4.0/dist
    • For creating a first collection, open the terminal, navigate to the solr bin location and execute the below command.
      ./solr create -c UserSearchCloud -d data_driven_schema_configs -n UserSearchCloud -s 2 -p 8993 -rf 2
      s– number of shards, rf- replication factors, 8993 – port of any one of the solr node that we had setup earlier(we had setup 4 solr instances)
    • UserSearchCloud is the collection name and the configset name which will be created from data_driven_schema_configs (which is given by me). If you see the folder structure for configs it will be like below.

SOLR cloud - Solr Collection
Solr Collection

  • After executing the create command mentioned above, you can go to the Solr Admin UI and see the collection like below.

SOLR cloud - Solr Console
Solr Console

  • Once after creating collection we can run the Dataimport like below. Click on Execute.

SOLR cloud - Data Import
Solr Data Import

With this the SOLR cloud setup is complete.

5. SPRING BOOT client with SOLRJ

We will now discuss how to test the SOLR cluster and query the data using SOLRJ APIs in a spring boot based micro service. I have provided my github link which gives the entire project code.

  • Create a new spring boot project with the following structure.

SOLR cloud - Project Structure
Project Structure

    • Configure the gradle dependencies to include SOLRJ library. You can look for the file in the full project link provided at the bottom.
    • Create a Java class called SolrUtil, in which the the zookeeper connection is made, as given below.

SOLRJ util to connect to Zookeeper

@Service
public class SolrUtil {
	
	CloudSolrClient solrClient;
	
	@SuppressWarnings("deprecation")
	public CloudSolrClient createConnection(){
		//You need to replace SERVERNAME with the server on which the zookeeper is running
		String zkHostString = "SERVERNAME:8997,SERVERNAME:8998,SERVERNAME:8999"; //- DEV
		if(solrClient == null){
			solrClient = new CloudSolrClient.Builder().withZkHost(zkHostString).build();
		}
		return solrClient;
	}
	
	public SolrDocumentList getSolrResponse(SolrQuery solrQuery, String collection, CloudSolrClient solrClient) {
		QueryResponse response = null;
		SolrDocumentList list = null;
		try {
			QueryRequest req = new QueryRequest(solrQuery);
			solrClient.setDefaultCollection(collection);
			response = req.process(solrClient);
			list = response.getResults();
		} catch (Exception e) {
			e.printStackTrace();//handle errors in this block
		}
		return list;
	}
}
  • Now create a SolrSearchService which can invoke the queries or update the document or delete in SOLR as shown below.

SOLRJ Service to CRUD Solr documents

@Service
public class SolrSearchService {

	@Autowired
	SolrUtil solrUtil;

	private static final String collection = "UserSearchCloud";

	public ResponseVO search(SearchRequestVO requestVO) {
		CloudSolrClient solrClient = solrUtil.createConnection();
		String query = requestVO.getQuery();
		SolrQuery solrQuery = new SolrQuery();
		solrQuery.setQuery(query);
		solrQuery.setRows(50);
		solrQuery.set("collection", collection);
		solrQuery.set("wt", "json");
		SolrDocumentList documentList = solrUtil.getSolrResponse(solrQuery, collection, solrClient);
		ResponseVO responseVO = new ResponseVO();
		if(documentList != null && documentList.size() >0){
			responseVO.setDocumentList(documentList);
			responseVO.setMessage("Success");
		}else{
			responseVO.setMessage("Failure");
			responseVO.setErrorMessage("Records Not Found");
		}
		return responseVO;
	}

	public ResponseVO update(UpdateRequestVO requestVO) {
		CloudSolrClient solrClient = solrUtil.createConnection();
		UpdateResponse response = new UpdateResponse();
		
		SolrDocument sdoc1 = null;
		String id = requestVO.getId();
		solrClient.setDefaultCollection(collection);
		SolrInputDocument sdoc = new SolrInputDocument();
		try {
			sdoc1 = solrClient.getById(id);
		} catch (SolrServerException e1) {
			e1.printStackTrace();
		} catch (IOException e1) {
			e1.printStackTrace();
		}
		if(sdoc1 != null){
			sdoc.setField("FIRST_NAME",requestVO.getFirstName() != null ? requestVO.getFirstName() : sdoc1.get("FIRST_NAME"));
			sdoc.setField("WORK_EMAIL",requestVO.getWorkEmail() != null ? requestVO.getWorkEmail() : sdoc1.get("WORK_EMAIL"));
			sdoc.setField("LAST_NAME",requestVO.getLastName() != null ? requestVO.getLastName() : sdoc1.get("LAST_NAME"));
			sdoc.setField("ADDRESS1",requestVO.getAddress1() != null ? requestVO.getAddress1() : sdoc1.get("ADDRESS1"));
			sdoc.setField("ADDRESS2",requestVO.getAddress2() != null ? requestVO.getAddress2() : sdoc1.get("ADDRESS2"));
			sdoc.setField("PHONE1",requestVO.getPhone1() != null ? requestVO.getPhone1() : sdoc1.get("PHONE1"));
			sdoc.setField("JOB_TITLE",requestVO.getJobTitle() != null ? requestVO.getJobTitle() : sdoc1.get("JOB_TITLE"));
			sdoc.setField("COMPANY_NAME",requestVO.getCompanyName() != null ? requestVO.getCompanyName() : sdoc1.get("COMPANY_NAME") );
			sdoc.setField("CITY",requestVO.getCity() != null ? requestVO.getCity() : sdoc1.get("CITY"));
			sdoc.setField("PHONE2",requestVO.getPhone2() != null ? requestVO.getPhone2() : sdoc1.get("PHONE2"));
			sdoc.setField("USER_NAME",requestVO.getUserName() != null ? requestVO.getUserName() : sdoc1.get("USER_NAME"));
			sdoc.setField("id",sdoc1.get("id"));
			sdoc.setField("_version_","0");
			try {
				solrClient.add(sdoc);
				response = solrClient.commit();
			} catch (SolrServerException e) {
				e.printStackTrace();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		ResponseVO responseVO = new ResponseVO();
		if(response != null && response.getResponse() != null){
			responseVO.setMessage("Document Updated");
		}else{
			responseVO.setErrorMessage("Document Not Found");
		}
		return responseVO;
	}
	
	public ResponseVO delete(DeleteRequestVO requestVO) {
		CloudSolrClient solrClient = solrUtil.createConnection();
		UpdateResponse response = new UpdateResponse();
		try {
			solrClient.setDefaultCollection(collection);
			response = solrClient.deleteById(requestVO.getId());
		} catch (SolrServerException e1) {
			e1.printStackTrace();
		} catch (IOException e1) {
			e1.printStackTrace();
		}
		ResponseVO responseVO = new ResponseVO();
		if(response != null){
			responseVO.setMessage("Document Deleted");
		}
		return responseVO;
	}

}
  • Finally, you can test the service from any rest client after starting the Spring boot service.

SOLR cloud - Rest Client Test
Rest Client Test

So this completes the entire end to end test.

6. Download the Source Code

This was an example to configure SOLR cloud with a Zookeeper ensemble and access it with Spring boot based SOLRJ project.

Download
You can download the full source code of this project here: SOLRJ-ZOOKEEPER-INTEGRATION

Chaitanya Rudrabhatla

Chaitanya K. Rudrabhatla works as a solutions architect in the Media and Entertainment domain. He has a proven ability to design and build complex web applications from ground up. Outside of work, he is an avid reader and a technology enthusiast who likes to be up to date with all the latest happenings in the Information technology world.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
4 years ago

Very informative article. Helped me to setup from scratch

4 years ago

This topic helped us to configure zookeeper easily in our organization.

Back to top button