Enterprise Java

Converting XML to JSON & Raw Use in MongoDB & Spring Batch

Overview

Why convert XML to JSON for raw use in MongoDB?

Since MongoDB uses JSON documents in order to store records, just as tables and rows store records in a relational database, we naturally need to convert our XML to JSON.

Some applications may need to store raw (unmodified) JSON because there is uncertainty in how the data will be structured.

There are hundreds of XML-based standards. If an application is to process XML files that do not follow the same standard, there is uncertainty in how the data will be structured.

Why use Spring Batch?

Spring Batch provides reusable functions that are essential in processing large volumes of records and other features that enable high-volume and high performance batch jobs. The Spring Website has documented Spring Batch well.

For another tutorial on Spring Batch, see my previous post on Processing CSVs with Spring Batch.

0 – Converting XML to JSON For Use In MongoDB With Spring Batch Example Application

The example application converts an XML document that is a “policy” for configuring a music playlist. This policy is intended to resemble real cyber security configuration documents. It is a short document but illustrates how you will search complex XML documents.

The approach we will be taking our tutorial is for handling XML files of varying style. We want to be able to handle the unexpected. This is why we are keeping the data “raw.”

1 – Project Structure

It is a typical Maven structure. We have one package for this example application. The XML file is in src/main/resources.

2 – Project Dependencies

Besides our typical Spring Boot dependencies, we include dependencies for an embedded MongoDB database and for processing JSON.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.michaelcgood</groupId>
	<artifactId>michaelcgood-spring-batch-mongodb</artifactId>
	<version>0.0.1</version>
	<packaging>jar</packaging>

	<name>michaelcgood-spring-batch-mongodb</name>
	<description>Michael C  Good - XML to JSON + MongoDB + Spring Batch Example</description>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>1.5.7.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-batch</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		<dependency>
			<groupId>de.flapdoodle.embed</groupId>
			<artifactId>de.flapdoodle.embed.mongo</artifactId>
			<version>1.50.5</version>
		</dependency>
		<dependency>
			<groupId>cz.jirutka.spring</groupId>
			<artifactId>embedmongo-spring</artifactId>
			<version>RELEASE</version>
		</dependency>
		<dependency>
				<groupId>org.json</groupId>
				<artifactId>json</artifactId>
				<version>20170516</version>
			</dependency>

			<dependency>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-starter-data-mongodb</artifactId>
			</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>


</project>

3 – XML Document

This is the example policy document created for this tutorial. It’s structure is based on real cyber security policy documents.

  • Note the parent of the document is the Policy tag.
  • Important information lies within the Group tag.
  • Look at the values that reside within the tags, such as the id in Policy or the date within status.

There’s a lot of information condensed in this small document to consider. For instance, there is also the XML namespace (xmlns). We won’t touch on this in the rest of the tutorial, but depending on your goals it could be something to add logic for.

<?xml version="1.0"?>
<Policy  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" style="STY_1.1" id="NRD-1">
  <status date="2017-10-18">draft</status>
  <title xmlns:xhtml="http://www.w3.org/1999/xhtml">Guide to the Configuration of Music Playlist</title>
   <description xmlns:xhtml="http://www.w3.org/1999/xhtml" >This guide presents a catalog of relevant
    configuration settings for a playlist that I listen to while I work on software development.
    <html:br xmlns:html="http://www.w3.org/1999/xhtml"/>
    <html:br xmlns:html="http://www.w3.org/1999/xhtml"/>
    Providing myself with such guidance reminds me how to efficiently
    configure my playlist.  Lorem ipsum <html:i xmlns:html="http://www.w3.org/1999/xhtml">Lorem ipsum,</html:i> 
    and Lorem ipsum.  Some example
    <html:i xmlns:html="http://www.w3.org/1999/xhtml">Lorem ipsum</html:i>, which are Lorem ipsum.
  </description>
  <Group id="remediation_functions">
    <title xmlns:xhtml="http://www.w3.org/1999/xhtml" >Remediation functions used by the SCAP Security Guide Project</title>
    <description xmlns:xhtml="http://www.w3.org/1999/xhtml" >XCCDF form of the various remediation functions as used by
      remediation scripts from the SCAP Security Guide Project</description>
    <Value id="is_the_music_good" prohibitChanges="true" >
      <title xmlns:xhtml="http://www.w3.org/1999/xhtml" >Remediation function to fix bad playlist</title>
      <description xmlns:xhtml="http://www.w3.org/1999/xhtml" >Function to fix bad playlist.
      
        
       Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum
       
       Lorem ipsum
       Lorem ipsum
       Lorem ipsum
       Lorem ipsum
      </description>
      <value>
        function fix_bad_playlist {
        
        # Load function arguments into local variables
       Lorem ipsum
       Lorem ipsum
       Lorem ipsum
        
        # Check sanity of the input
        if [ $# Lorem ipsum ]
        then
        echo "Usage: Lorem ipsum"
        echo "Aborting."
        exit 1
        fi
        
        }
      </value>
    </Value>
    </Group>
    </Policy>

4 – MongoDB Configuration

Below we specify that we are using an embedded MongoDB database, make it discoverable for a component scan that is bundled in the convenience annotation @SpringBootApplication, and specify that mongoTemplate will be a bean.

package com.michaelcgood;

import java.io.IOException;
import cz.jirutka.spring.embedmongo.EmbeddedMongoFactoryBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.mongodb.core.*;
import com.mongodb.MongoClient;
 
 
@Configuration
public class MongoConfig {
 
    private static final String MONGO_DB_URL = "localhost";
    private static final String MONGO_DB_NAME = "embeded_db";
    @Bean
    public MongoTemplate mongoTemplate() throws IOException {
        EmbeddedMongoFactoryBean mongo = new EmbeddedMongoFactoryBean();
        mongo.setBindIp(MONGO_DB_URL);
        MongoClient mongoClient = mongo.getObject();
        MongoTemplate mongoTemplate = new MongoTemplate(mongoClient, MONGO_DB_NAME);
        return mongoTemplate;
    }
}

5 – Processing XML to JSON

step1() of our Spring Batch Job contains calls three methods to help process the XML to JSON. We will review each individually.

@Bean
    public Step step1() {
        return stepBuilderFactory.get("step1")
                .tasklet(new Tasklet() {
                    @Override
                    public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception {
                        
                        // get path of file in src/main/resources
                        Path xmlDocPath =  Paths.get(getFilePath());
                        
                        // process the file to json
                         String json = processXML2JSON(xmlDocPath);
                         
                         // insert json into mongodb
                         insertToMongo(json);
                        return RepeatStatus.FINISHED;
                    }
                }).build();
    }

5.1 – getFilePath()

This method simply gets the file path that is passed as a parameter to the method processXML2JSON.
Note:

  • ClassLoader is helping us locate the XML file in our resources folder.
// no parameter method for creating the path to our xml file
    private String getFilePath(){
        
        String fileName = "FakePolicy.xml";
        ClassLoader classLoader = getClass().getClassLoader();
        File file = new File(classLoader.getResource(fileName).getFile());
        String xmlFilePath = file.getAbsolutePath();
        
        return xmlFilePath;
    }

5.2 – processXML2JSON(xmlDocPath)

The string returned by getFilePath is passed into this method as a parameter. A JSONOBject is created from a String of the XML file.

// takes a parameter of xml path and returns json as a string
    private String processXML2JSON(Path xmlDocPath) throws JSONException {
        
        
        String XML_STRING = null;
        try {
            XML_STRING = Files.lines(xmlDocPath).collect(Collectors.joining("\n"));
        } catch (IOException e) {
            e.printStackTrace();
        }
        
        JSONObject xmlJSONObj = XML.toJSONObject(XML_STRING);
        String jsonPrettyPrintString = xmlJSONObj.toString(PRETTY_PRINT_INDENT_FACTOR);
        System.out.println("PRINTING STRING :::::::::::::::::::::" + jsonPrettyPrintString);
        
        return jsonPrettyPrintString;
    }

5.3 – insertToMongo(json)

We insert the parsed JSON into a MongoDB document. We then insert this document with the help of the @Autowired mongoTemplate into a collection named “foo”.

// inserts to our mongodb
    private void insertToMongo(String jsonString){
        Document doc = Document.parse(jsonString);
        mongoTemplate.insert(doc, "foo");
    }

6 – Querying MongoDB

step2() of our Spring Batch Job contains our MongoDB queries.

  • mongoTemplate.collectionExists returns a Boolean value based on the existence of the collection.
  • mongoTemplate.getCollection(“foo”).find() returns all the documents within the collection.
  • alldocs.toArray() returns an array of DBObjects.
  • Then we call three methods that we will review individually below.
public Step step2(){
        return stepBuilderFactory.get("step2")
            .tasklet(new Tasklet(){
            @Override
            public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception{
                // all printing out to console removed for post's brevity
                // checks if our collection exists
                Boolean doesexist = mongoTemplate.collectionExists("foo");
                
                // show all DBObjects in foo collection
                DBCursor alldocs = mongoTemplate.getCollection("foo").find();
                List<DBObject> dbarray = alldocs.toArray();
                
                // execute the three methods we defined for querying the foo collection
                String result = doCollect();
                String resultTwo = doCollectTwo();
                String resultThree = doCollectThree();
               
                return RepeatStatus.FINISHED;
            }
        }).build();
    }

6.1 – First query

The goal of this query is to find a document where style=”STY_1.1″. To accomplish this, we need to remember where style resides in the document. It is a child of Policy; therefore, we address it in the criteria as Policy.style.

The other goal of this query is to only return the id field of the Policy. It is also just a child of Policy.

The result is returned by calling this method: mongoTemplate.findOne(query, String.class, “foo”);. The output is a String, so the second parameter is String.class. The third parameter is our collection name.

public String doCollect(){
        Query query = new Query();
        query.addCriteria(Criteria.where("Policy.style").is("STY_1.1")).fields().include("Policy.id");
        String result = mongoTemplate.findOne(query, String.class, "foo");
        return result;
    }

6.2 – Second query

The difference between the second query and the first query are the fields returned. In the second query, we return Value, which is a child of both Policy and Group.

public String doCollectTwo(){
        Query query = new Query();
        query.addCriteria(Criteria.where("Policy.style").is("STY_1.1")).fields().include("Policy.Group.Value");
        String result = mongoTemplate.findOne(query, String.class, "foo");
        
        return result;
    }

6.3 – Third query

The criteria for the third query is different. We only want to return a document with the id “NRD-1” and a status date of “2017-10-18”. We only want to return two fields: title and description, which are both children of Value.

Referring to the XML document or the printed JSON in the demo below for further clarification on the queries.

public String doCollectThree(){
        Query query = new Query();
        query.addCriteria(Criteria.where("Policy.id").is("NRD-1").and("Policy.status.date").is("2017-10-18")).fields().include("Policy.Group.Value.title").include("Policy.Group.Value.description");
        String result = mongoTemplate.findOne(query, String.class, "foo");
        
        return result;
    }

7 – Spring Batch Job

The Job begins with step1 and calls step2 next.

@Bean
    public Job xmlToJsonToMongo() {
        return jobBuilderFactory.get("XML_Processor")
                .start(step1())
                .next(step2())
                .build();
    }

8 – @SpringBootApplication

This is a standard class with static void main and @SpringBootApplication.

package com.michaelcgood;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;

@SpringBootApplication
@EnableAutoConfiguration(exclude={DataSourceAutoConfiguration.class})
public class SpringBatchMongodb {

	public static void main(String[] args) {
		SpringApplication.run(SpringBatchMongodb.class, args);
	}
}

9 – Demo

9.1 – step1

The JSON is printed as a String. I have cut the output past description below because it is long.

Executing step: [step1]
PRINTING STRING :::::::::::::::::::::{"Policy": {
    "Group": {
        "Value": {
            "prohibitChanges": true,
            "description": {

9.2 – step2

I have cut the results to format the output for the blog post.

Executing step: [step2]

Checking if the collection exists

Status of collection returns :::::::::::::::::::::true

Show all objects

list of db objects returns:::::::::::::::::::::[{ "_id" : { "$oid" : "59e7c0324ad9510acf5773c0"} , [..]

Just return the id of Policy

RESULT:::::::::::::::::::::{ "_id" : { "$oid" : "59e7c0324ad9510acf5773c0"} , "Policy" : { "id" : "NRD-1"}}

To see the other results printed to the console, fork/download the code from Github and run the application.

10 – Conclusion

We have reviewed how to convert XML to JSON, store the JSON to MongoDB, and how to query the database for specific results.

Further reading:

The source code is on Github

Published on Java Code Geeks with permission by Michael Good, partner at our JCG program. See the original article here: Converting XML to JSON + Raw Use in MongoDB + Spring Batch

Opinions expressed by Java Code Geeks contributors are their own.

Michael Good

Michael is a software engineer located in the Washington DC area that is interested in Java, cyber security, and open source technologies. Follow his personal blog to read more from Michael.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button